-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19951][SQL] Add string concatenate operator || to Spark SQL #17711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this just expression(exprs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I missed.. you're right. I'll fix
|
can you add a test case in sql query file tests? |
|
Test build #76009 has finished for PR 17711 at commit
|
|
okay, I'll add soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the end of the file. It can minimize the code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you do Concat(exprs.map(expression)), isn't it Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: UnresolvedAttribute("c") :: Nil)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I see. But I think we may simplify nested Concats in visitConcat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, I'll re-think a bit more, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya How about the latest fix?
|
Test build #76010 has finished for PR 17711 at commit
|
|
Test build #76016 has started for PR 17711 at commit |
|
Test build #76014 has finished for PR 17711 at commit
|
|
Jenkins, retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the CatalystSqlParser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this to the head of the primaryExpression rule? That seems easier.
I am also trying to figure how this works with other binary operators, for example: a + b || c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do you suggested, we need to write a rule like primaryExpression (CONCAT_PIPE primaryExpression)+ to avoid left-recursive. But, IIUC this rule parses a || b || c into Concat(a, Concat(b, c))`. So, I fixed in the current way from the suggestion of @viirya.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it seems we have the same behaviour with mysql;
mysql> select 1 + 2 || '3';
+--------------+
| 1 + 2 || '3' |
+--------------+
| 24 |
+--------------+
1 row in set (0.00 sec)
postgres=# select 1 + 2 || '3';
?column?
----------
33
(1 row)
scala> sql("""select 1 + 2 || '3'""").show
+------------------------------------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(concat(CAST(2 AS STRING), 3) AS DOUBLE))|
+------------------------------------------------------------------+
| 24.0|
+------------------------------------------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the test cases to check whether we correctly follow the precedence like Oracle?
Ref: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#1003584
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Sorry, but l'll update in a few days because I'm on vacation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting...So seems mysql and postgres has different precedence for it.
|
Test build #76020 has finished for PR 17711 at commit
|
|
Test build #76030 has finished for PR 17711 at commit
|
|
ping |
|
@hvanhovell ping |
|
Test build #76578 has finished for PR 17711 at commit
|
|
Test build #76580 has finished for PR 17711 at commit
|
|
Jenkins, retest this please. |
|
Test build #76597 has finished for PR 17711 at commit
|
|
This failure seems unrelated to this pr? (other prs also hit the same R test failure...). |
|
Jenkins, retest this please. |
|
The R test failure seemed to be fixed in 2abfee1 |
|
Test build #76606 has finished for PR 17711 at commit
|
|
|
||
| /** | ||
| * Collapse nested [[Concat]] expressions. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it to org.apache.spark.sql.catalyst.optimizer.expressions.scala
| CollapseRepartition, | ||
| CollapseProject, | ||
| CollapseWindow, | ||
| CollapseConcat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not part of Operator combine. Maybe move it to the spot around SimplifyCasts
| }) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan transformAllExpressions ?
| * Collapse nested [[Concat]] expressions. | ||
| */ | ||
| object CollapseConcat extends Rule[LogicalPlan] { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail recursion? or using queue/stack?
|
Please follow the other optimizer rules. We need to add a optimizer test suite. For example, |
This reverts commit c88652c.
|
Test build #76795 has finished for PR 17711 at commit
|
| select 5 % 3; | ||
| select pmod(-7, 3); | ||
|
|
||
| -- check operator precedence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the precedence rules we follow in the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
LGTM pending minor comment. We need to add an extra optimizer rule to combine the adjacent concatenate expressions. Thanks! |
|
I quickly brushed up the Optimizer code based on your advice: Using I checked the spark style-guide and I probably think we'd better to use more readable one. So, |
|
Test build #76840 has finished for PR 17711 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link could be ineffective in the future. Could you also copy the table contents here? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
@maropu The solution using |
|
I feel both are pretty complicated. Can we just do something similar to CombineUnion: It's going to be simpler because you don't need to handle distinct here. |
|
@rxin ok, thank for the suggestion! |
|
Test build #76850 has started for PR 17711 at commit |
|
Jenkins, retest this please. |
|
Test build #76856 has finished for PR 17711 at commit
|
|
Thanks! Merging to master. |
## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.
## What changes were proposed in this pull request? This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. ## How was this patch tested? Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.
## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.
## What changes were proposed in this pull request? This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. ## How was this patch tested? Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.
## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.
This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.
What changes were proposed in this pull request?
This pr added code to support
||for string concatenation. This string operation is supported in PostgreSQL and MySQL.How was this patch tested?
Added tests in
SparkSqlParserSuite