[SPARK-19951][SQL] Add string concatenate operator || to Spark SQL #17711

maropu · 2017-04-21T00:54:31Z

What changes were proposed in this pull request?

This pr added code to support || for string concatenation. This string operation is supported in PostgreSQL and MySQL.

How was this patch tested?

Added tests in SparkSqlParserSuite

rxin · 2017-04-21T00:56:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

isn't this just expression(exprs)?

oh, I missed.. you're right. I'll fix

rxin · 2017-04-21T00:56:57Z

can you add a test case in sql query file tests?

SparkQA · 2017-04-21T00:59:18Z

Test build #76009 has finished for PR 17711 at commit bd36e58.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-04-21T01:06:04Z

okay, I'll add soon.

gatorsmile · 2017-04-21T02:45:03Z

sql/core/src/test/resources/sql-tests/inputs/string-functions.sql

Please move this to the end of the file. It can minimize the code changes.

viirya · 2017-04-21T03:21:37Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

Because you do Concat(exprs.map(expression)), isn't it Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: UnresolvedAttribute("c") :: Nil)?

oh. I see. But I think we may simplify nested Concats in visitConcat.

aha, I'll re-think a bit more, thanks!

@viirya How about the latest fix?

SparkQA · 2017-04-21T03:57:09Z

Test build #76010 has finished for PR 17711 at commit 9cfaef6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-21T05:37:34Z

Test build #76016 has started for PR 17711 at commit 72d1ae1.

SparkQA · 2017-04-21T07:03:33Z

Test build #76014 has finished for PR 17711 at commit 0701f87.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-04-21T07:37:58Z

Jenkins, retest this please.

hvanhovell · 2017-04-21T07:41:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

Can you move this to the CatalystSqlParser?

hvanhovell · 2017-04-21T07:48:52Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Should we move this to the head of the primaryExpression rule? That seems easier.

I am also trying to figure how this works with other binary operators, for example: a + b || c.

If we do you suggested, we need to write a rule like primaryExpression (CONCAT_PIPE primaryExpression)+ to avoid left-recursive. But, IIUC this rule parses a || b || c into Concat(a, Concat(b, c))`. So, I fixed in the current way from the suggestion of @viirya.

Currently, it seems we have the same behaviour with mysql;

mysql> select 1 + 2 || '3'; +--------------+ | 1 + 2 || '3' | +--------------+ | 24 | +--------------+ 1 row in set (0.00 sec) postgres=# select 1 + 2 || '3'; ?column? ---------- 33 (1 row) scala> sql("""select 1 + 2 || '3'""").show +------------------------------------------------------------------+ |(CAST(1 AS DOUBLE) + CAST(concat(CAST(2 AS STRING), 3) AS DOUBLE))| +------------------------------------------------------------------+ | 24.0| +------------------------------------------------------------------+

Could you add the test cases to check whether we correctly follow the precedence like Oracle?

Ref: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#1003584

ok. Sorry, but l'll update in a few days because I'm on vacation..

This is interesting...So seems mysql and postgres has different precedence for it.

SparkQA · 2017-04-21T09:11:13Z

Test build #76020 has finished for PR 17711 at commit 72d1ae1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-21T12:13:22Z

Test build #76030 has finished for PR 17711 at commit 83242d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-04-22T08:25:04Z

ping

maropu · 2017-04-24T23:53:14Z

@hvanhovell ping

SparkQA · 2017-05-08T15:02:20Z

Test build #76578 has finished for PR 17711 at commit f984c6b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-08T17:59:08Z

Test build #76580 has finished for PR 17711 at commit 96293f3.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-05-08T23:06:20Z

Jenkins, retest this please.

SparkQA · 2017-05-09T01:24:49Z

Test build #76597 has finished for PR 17711 at commit 96293f3.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-05-09T01:46:29Z

This failure seems unrelated to this pr? (other prs also hit the same R test failure...).

maropu · 2017-05-09T01:46:36Z

Jenkins, retest this please.

maropu · 2017-05-09T02:22:22Z

The R test failure seemed to be fixed in 2abfee1

SparkQA · 2017-05-09T04:08:22Z

Test build #76606 has finished for PR 17711 at commit 96293f3.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-11T07:47:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala


+/**
+ * Collapse nested [[Concat]] expressions.
+ */


Please move it to org.apache.spark.sql.catalyst.optimizer.expressions.scala

gatorsmile · 2017-05-11T07:49:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

      CollapseRepartition,
      CollapseProject,
      CollapseWindow,
+      CollapseConcat,


This is not part of Operator combine. Maybe move it to the spot around SimplifyCasts

gatorsmile · 2017-05-11T07:50:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+      })
+    }
+  }
+


plan transformAllExpressions ?

gatorsmile · 2017-05-11T07:55:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+ * Collapse nested [[Concat]] expressions.
+ */
+object CollapseConcat extends Rule[LogicalPlan] {
+


tail recursion? or using queue/stack?

gatorsmile · 2017-05-11T07:57:09Z

Please follow the other optimizer rules. We need to add a optimizer test suite. For example, SimplifyConditionalSuite

This reverts commit c88652c.

SparkQA · 2017-05-11T11:21:58Z

Test build #76795 has finished for PR 17711 at commit 96db575.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-11T16:56:10Z

sql/core/src/test/resources/sql-tests/inputs/operators.sql

 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence


Could you add the precedence rules we follow in the comments?

gatorsmile · 2017-05-11T16:57:17Z

LGTM pending minor comment.

We need to add an extra optimizer rule to combine the adjacent concatenate expressions. Thanks!

maropu · 2017-05-12T02:43:34Z

I quickly brushed up the Optimizer code based on your advice:
Using Stack:
a17d933#diff-a1acb054bc8888376603ef510e6d0ee0R551

Using tailrec:
master...maropu:SPARK-19951-3#diff-a1acb054bc8888376603ef510e6d0ee0R552

I checked the spark style-guide and I probably think we'd better to use more readable one. So, tailrec is better? I'll submit the tailrec one after this merged.

SparkQA · 2017-05-12T05:13:56Z

Test build #76840 has finished for PR 17711 at commit 089db30.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-12T05:49:21Z

sql/core/src/test/resources/sql-tests/inputs/operators.sql

The link could be ineffective in the future. Could you also copy the table contents here? Thanks!

gatorsmile · 2017-05-12T05:51:57Z

@maropu The solution using tailrec looks more straightforward. Could you submit the PR based on that? Thanks!

rxin · 2017-05-12T06:11:16Z

I feel both are pretty complicated. Can we just do something similar to CombineUnion:

/**
 * Combines all adjacent [[Union]] operators into a single [[Union]].
 */
object CombineUnions extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    case u: Union => flattenUnion(u, false)
    case Distinct(u: Union) => Distinct(flattenUnion(u, true))
  }

  private def flattenUnion(union: Union, flattenDistinct: Boolean): Union = {
    val stack = mutable.Stack[LogicalPlan](union)
    val flattened = mutable.ArrayBuffer.empty[LogicalPlan]
    while (stack.nonEmpty) {
      stack.pop() match {
        case Distinct(Union(children)) if flattenDistinct =>
          stack.pushAll(children.reverse)
        case Union(children) =>
          stack.pushAll(children.reverse)
        case child =>
          flattened += child
      }
    }
    Union(flattened)
  }
}

It's going to be simpler because you don't need to handle distinct here.

maropu · 2017-05-12T06:20:14Z

@rxin ok, thank for the suggestion!

SparkQA · 2017-05-12T06:22:40Z

Test build #76850 has started for PR 17711 at commit de89791.

maropu · 2017-05-12T07:11:00Z

Jenkins, retest this please.

SparkQA · 2017-05-12T09:33:49Z

Test build #76856 has finished for PR 17711 at commit de89791.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-12T16:56:32Z

Thanks! Merging to master.

## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.

## What changes were proposed in this pull request? This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. ## How was this patch tested? Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.

## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.

## What changes were proposed in this pull request? This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. ## How was this patch tested? Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.

## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#17970 from maropu/SPARK-20730.

This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL. Added tests in `SparkSqlParserSuite` Author: Takeshi Yamamuro <[email protected]> Closes apache#17711 from maropu/SPARK-19951.

rxin reviewed Apr 21, 2017

View reviewed changes

gatorsmile reviewed Apr 21, 2017

View reviewed changes

viirya reviewed Apr 21, 2017

View reviewed changes

hvanhovell reviewed Apr 21, 2017

View reviewed changes

maropu force-pushed the SPARK-19951 branch from 83242d5 to f984c6b Compare May 8, 2017 13:35

maropu force-pushed the SPARK-19951 branch from f984c6b to 96293f3 Compare May 8, 2017 15:32

maropu added 4 commits May 9, 2017 14:16

Add string concatenate operator || to Spark SQL

5544ff2

Apply comments

05d490e

Apply review comments

afcd950

Brush up parsing rules

f89d131

gatorsmile reviewed May 11, 2017

View reviewed changes

Revert "Add a new rule to collapse multiple concats in Optimizer"

96db575

This reverts commit c88652c.

gatorsmile reviewed May 11, 2017

View reviewed changes

gatorsmile reviewed May 12, 2017

View reviewed changes

Add comments

de89791

maropu force-pushed the SPARK-19951 branch from 089db30 to de89791 Compare May 12, 2017 06:19

asfgit closed this in b526f70 May 12, 2017

maropu mentioned this pull request May 13, 2017

[SPARK-20730][SQL] Add an optimizer rule to combine nested Concat #17970

Closed

[SPARK-19951][SQL] Add string concatenate operator || to Spark SQL #17711

[SPARK-19951][SQL] Add string concatenate operator || to Spark SQL #17711

Uh oh!

Conversation

maropu commented Apr 21, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 21, 2017

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

maropu commented Apr 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

maropu commented Apr 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

maropu commented Apr 22, 2017

Uh oh!

maropu commented Apr 24, 2017

Uh oh!

SparkQA commented May 8, 2017

Uh oh!

SparkQA commented May 8, 2017

Uh oh!

maropu commented May 8, 2017

Uh oh!

SparkQA commented May 9, 2017

Uh oh!

maropu commented May 9, 2017

Uh oh!

maropu commented May 9, 2017

Uh oh!

viirya Apr 21, 2017 •

edited

Loading

maropu Apr 21, 2017 •

edited

Loading

rxin commented May 12, 2017 •

edited

Loading