-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10442][SQL] fix string to boolean cast #8698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #42267 has finished for PR 8698 at commit
|
|
cc @yhuai, looks no hive compatibility tests is broken :) |
|
Would be nice to have a test for a persisted table partitioned by a boolean column. sqlContext.range(2).selectExpr("(id % 2 = 0) as b", "id").write.partitionBy("b").saveAsTable("t")
sqlContext.table("t").show()Currently this snippet produces wrong answer (all boolean values are |
|
Although this change doesn't break any existing Hive compatibility tests, it's still a breaking change. We might want to have a separate SQL option to let users be able to fallback to the old behavior. The partitioned table case should be fixed in a separate PR (don't use |
|
A compatibility option would be reasonable. My vote would be for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be a static variable somewhere?
|
I think having a config flag is reasonable. If we want that, it should be a single flag that dictates whether we should follow Hive, or our own standards. However, in this case it seems it is too much work to bring a flag, and the benefit isn't huge yet. So I would just follow Vertica's approach. |
|
+1 to @rxin 's suggestions |
e7b50f6 to
8706165
Compare
|
Test build #42311 has finished for PR 8698 at commit
|
|
LGTM |
|
LGTM. I am merging it to master. |
When we cast string to boolean in hive, it returns
trueif the length of string is > 0, and spark SQL follows this behavior.However, this behavior is very different from other SQL systems:
truefor 't' 'true' '1',falsefor 'f' 'false' '0', throw exception for others.truefor 't' 'true' 'y' 'yes' '1',falsefor 'f' 'false' 'n' 'no' '0', null for others.truefor 't' 'true' 'y' 'yes' 'on' '1',falsefor 'f' 'false' 'n' 'no' 'off' '0', throw exception for others.truefor 't' 'true' 'y' 'yes' '1',falsefor 'f' 'false' 'n' 'no' '0', null for others.Whether we should change the cast behavior according to other SQL system or not is not decided yet, this PR is a test to see if we changed, how many compatibility tests will fail.