Process non-nullable scala type before udf#1471
Open
wycccccc wants to merge 5 commits intoopensource4you:mainfrom
Open
Process non-nullable scala type before udf#1471wycccccc wants to merge 5 commits intoopensource4you:mainfrom
wycccccc wants to merge 5 commits intoopensource4you:mainfrom
Conversation
chia7712
reviewed
Feb 2, 2023
| #Spark checkpoint path | ||
| checkpoint = | ||
| #Spark checkpoint | ||
| checkpoint.path = |
Contributor
There was a problem hiding this comment.
請問為何加上.path? 如果是要統一命名的話,Metadata裡面用的變數名稱也要跟著改
Collaborator
Author
There was a problem hiding this comment.
主要是shell 如果按照checkpoint去搜索會把上方的註解也一併識別,因此乾脆改一個統一的名字。
chia7712
reviewed
Feb 2, 2023
| if [[ "$master" == "spark:"* ]] || [[ "$master" == "local"* ]]; then | ||
| docker run -d --init \ | ||
| --name "csv-kafka-${source_name}" \ | ||
| --name "csv-kafka${source_name}" \ |
Collaborator
Author
There was a problem hiding this comment.
沒有,我在查上面那個bug時不小心刪掉的,已恢復。
chia7712
reviewed
Feb 2, 2023
|
|
||
| private def schema(columns: Seq[DataColumn]): StructType = | ||
| StructType(columns.map { col => | ||
| if (col.dataType != DataType.StringType) |
Collaborator
Author
There was a problem hiding this comment.
沒錯,目前我測試下來已支援。因爲在column時是能夠處理null的,但如果放在udf中轉換回scala中的某些type就不支持null處理了。
chia7712
reviewed
Feb 2, 2023
| cols.flatMap(c => | ||
| List( | ||
| lit(c.name), | ||
| when(col(c.name).isNotNull, col(c.name)).otherwise(lit(null)) |
Contributor
There was a problem hiding this comment.
或許我們可以直接把 null 的欄位取消掉,因為當null的時候就代表沒有該值,直接過濾掉可能還可以提升一點效能
Collaborator
Author
There was a problem hiding this comment.
這是我能想到的將null欄位取消掉的寫法,看上去沒有很優雅,但我也找不到其他的了。有優雅的我再修改。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolved #1286
統一處理爲string type不太好,因此換了一種做法。如果檢測到column 爲null提前設置好null就行。
順便修了一些bug。
grep 不會匹配 . 因此在腳本中會把註解也匹配到。