-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement](fix)change ordinary type null value is \N,complex type null value is null #24207
Conversation
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
90d64b9
to
9039fff
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
But please describe more in your PR comment about how to handle null
in text
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
auto& null_column = assert_cast<ColumnNullable&>(column); | ||
// TODO(Amory) make null literal configurable | ||
|
||
if (!(options.converted_from_string && slice.trim_quote())) { | ||
//for map<string,string> type : {"abc","NULL"} , the NULL is string , instead of null values | ||
if (slice.size == 4 && slice[0] == 'N' && slice[1] == 'U' && slice[2] == 'L' && | ||
slice[3] == 'L') { | ||
if (nesting_level >= 2 && slice.size == 4 && slice[0] == 'n' && slice[1] == 'u' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment to describe the logic here. Better give some example
9039fff
to
0a5ba7e
Compare
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
(From new machine)TeamCity pipeline, clickbench performance test result: |
* for null values in nested types, we use null to represent them, just like the json format. | ||
* | ||
* example: | ||
* If you have three nullable columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a note: null -> int -> NULL | null -> char family -> "null"
in csv(text) for normal type: we only recognize \N for null
so
for not char family type, like int, if we put null literal , it will parse fail, and make result null,not just because it equals \N
for char family type, like string, if we put null literal, it will parse success, and "null" literal will be stored in doris
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
(From new machine)TeamCity pipeline, clickbench performance test result: |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
(From new machine)TeamCity pipeline, clickbench performance test result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
…null value is null (#24207)
Proposed changes
For null values in ordinary types, we use \N to represent them;
for null values in nested types, we use null to represent them, just like the json format.
example:
If you have three nullable columns
a : int, b : string, c : map<string,int>
data:
if you set trim_double_quotes = true
you will get :
if you set trim_double_quotes = false
you will get :
in csv(text) for normal type: we only recognize \N for null , so
for not char family type, like int, if we put null literal ,
it will parse fail, and make result null,not just because it equals \N.
for char family type, like string, if we put null literal, it will parse success,
and "null" literal will be stored in doris.
For strings in the json complex type, we remove double quotes by default.
Because when querying complex types, such as selecting complexColumn from table,
we will add double quotes to the strings in the complex type.
For the map<string,int> column, insert { "abc" : 1, "hello",2 }.
If you do not remove the double quotes, it will display {""abc"":1,""hello"": 2 },
remove the double quotes to display { "abc" : 1, "hello",2 }.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...