-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22771][SQL] Concatenate binary inputs into a binary output #19977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84904 has finished for PR 19977 at commit
|
|
Test build #84910 has finished for PR 19977 at commit
|
|
Could you confirm whether Hive behaves the same? |
|
ok |
|
checked; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we worry about backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, should be. Any existing option for keeping back compatibility? Or, how about adding a new option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conf is needed for sure. We also need a Migration Guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to add a new option for this case only? If we keep adding new options for each case, options could blow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all inputs are binary, concat also outputs binary.
Is this true in Hive and others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will check some patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pg and hive have the same;
postgres=# create table t1(a bytea, b bytea, c varchar, d varchar);
postgres=# create view v1 as select a || b || c || d from t1;
postgres=# \d v1
View "public.view41_1"
Column | Type | Modifiers
----------+------+-----------
?column? | text |
hive> create table t1(a binary, b binary, c text, d test);
hive> create view v1 as select a || b || c || d from t1;
hive> describe v1;
_c0 string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming it! Below is the behavior of DB2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, thanks for the info!
I checked the db2 behaviour and I found db2 seems to have a bit different casting rule.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000736.html?view=kc
IIUC, in db2, the type of concat(binary, string) is binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked mysql: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
recap:
hive, postgresql: concat(binary, string) => string
mysql, db2: conat(binary, string) => binary
|
Test build #84950 has finished for PR 19977 at commit
|
|
Test build #84951 has finished for PR 19977 at commit
|
|
Test build #84953 has finished for PR 19977 at commit
|
|
Test build #84963 has finished for PR 19977 at commit
|
|
Test build #84987 has finished for PR 19977 at commit
|
|
retest this please |
|
Test build #84996 has finished for PR 19977 at commit
|
|
retest this please |
|
Test build #85000 has finished for PR 19977 at commit
|
|
oh... |
|
retest this please |
|
Test build #85005 has finished for PR 19977 at commit
|
|
retest this please |
|
Test build #85031 has finished for PR 19977 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> spark.sql.typeCoercion.concatBinaryAsString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update after reviews finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe CONCAT_BINARY_AS_STRING_ENABLED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> spark.sql.function.concatBinaryAsString
|
@maropu No need to re-trigger it. The failure is not caused by this PR. |
|
Will review it tomorrow. Thanks! |
|
I found different behaviours in a string functions |
|
You mean answers of mysql is unexpected? I think it's common these dbs get different behaviors, while Spark mainly follows Hive. |
|
Test build #85451 has finished for PR 19977 at commit
|
|
Test build #85454 has finished for PR 19977 at commit
|
| select right(null, -2), right("abcd", -2), right("abcd", 0), right("abcd", 'a'); | ||
|
|
||
| -- turn on concatBinaryAsString | ||
| set spark.sql.function.concatBinaryAsString=false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turn on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most of other dbms-like systems concat binary inputs as binary, IMO turning off by default is okay to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant you said turn on in the comment (L28).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh....
|
LGTM |
|
Test build #85473 has finished for PR 19977 at commit
|
|
Try |
|
ah, ok. good catch. I'll fix soon. |
|
Test build #85495 has finished for PR 19977 at commit
|
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan.transformExpressionsDown { | ||
| case concat: Concat if concat.children.exists(_.isInstanceOf[Concat]) => | ||
| case concat: Concat if concat.children.exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a dedicated helper function for the if condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Test build #85511 has finished for PR 19977 at commit
|
|
retest this please |
|
Test build #85535 has finished for PR 19977 at commit
|
|
LGTM Thanks! Merged to master. |
|
thanks, I'll fix |
…tDataTypes ## What changes were proposed in this pull request? This pr is a follow-up to fix a bug left in #19977. ## How was this patch tested? Added tests in `StringExpressionsSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #20149 from maropu/SPARK-22771-FOLLOWUP. (cherry picked from commit 6f68316) Signed-off-by: gatorsmile <[email protected]>
…tDataTypes ## What changes were proposed in this pull request? This pr is a follow-up to fix a bug left in #19977. ## How was this patch tested? Added tests in `StringExpressionsSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #20149 from maropu/SPARK-22771-FOLLOWUP.
## What changes were proposed in this pull request? This pr modified `elt` to output binary for binary inputs. `elt` in the current master always output data as a string. But, in some databases (e.g., MySQL), if all inputs are binary, `elt` also outputs binary (Also, this might be a small surprise). This pr is related to #19977. ## How was this patch tested? Added tests in `SQLQueryTestSuite` and `TypeCoercionSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #20135 from maropu/SPARK-22937. (cherry picked from commit e8af7e8) Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This pr modified `elt` to output binary for binary inputs. `elt` in the current master always output data as a string. But, in some databases (e.g., MySQL), if all inputs are binary, `elt` also outputs binary (Also, this might be a small surprise). This pr is related to #19977. ## How was this patch tested? Added tests in `SQLQueryTestSuite` and `TypeCoercionSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #20135 from maropu/SPARK-22937.
What changes were proposed in this pull request?
This pr modified
concatto concat binary inputs into a single binary output.concatin the current master always output data as a string. But, in some databases (e.g., PostgreSQL), if all inputs are binary,concatalso outputs binary.How was this patch tested?
Added tests in
SQLQueryTestSuiteandTypeCoercionSuite.