Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Apr 21, 2017

What changes were proposed in this pull request?

This pr added code to support || for string concatenation. This string operation is supported in PostgreSQL and MySQL.

How was this patch tested?

Added tests in SparkSqlParserSuite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just expression(exprs)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I missed.. you're right. I'll fix

@rxin
Copy link
Contributor

rxin commented Apr 21, 2017

can you add a test case in sql query file tests?

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76009 has finished for PR 17711 at commit bd36e58.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 21, 2017

okay, I'll add soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the end of the file. It can minimize the code changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you do Concat(exprs.map(expression)), isn't it Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: UnresolvedAttribute("c") :: Nil)?

Copy link
Member

@viirya viirya Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. I see. But I think we may simplify nested Concats in visitConcat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, I'll re-think a bit more, thanks!

Copy link
Member Author

@maropu maropu Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya How about the latest fix?

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76010 has finished for PR 17711 at commit 9cfaef6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76016 has started for PR 17711 at commit 72d1ae1.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76014 has finished for PR 17711 at commit 0701f87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 21, 2017

Jenkins, retest this please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the CatalystSqlParser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the head of the primaryExpression rule? That seems easier.

I am also trying to figure how this works with other binary operators, for example: a + b || c.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do you suggested, we need to write a rule like primaryExpression (CONCAT_PIPE primaryExpression)+ to avoid left-recursive. But, IIUC this rule parses a || b || c into Concat(a, Concat(b, c))`. So, I fixed in the current way from the suggestion of @viirya.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it seems we have the same behaviour with mysql;

mysql> select 1 + 2 || '3';
+--------------+
| 1 + 2 || '3' |
+--------------+
|           24 |
+--------------+
1 row in set (0.00 sec)

postgres=# select 1 + 2 || '3';
 ?column? 
----------
 33
(1 row)

scala> sql("""select 1 + 2 || '3'""").show
+------------------------------------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(concat(CAST(2 AS STRING), 3) AS DOUBLE))|
+------------------------------------------------------------------+
|                                                              24.0|
+------------------------------------------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the test cases to check whether we correctly follow the precedence like Oracle?

Ref: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#1003584

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Sorry, but l'll update in a few days because I'm on vacation..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting...So seems mysql and postgres has different precedence for it.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76020 has finished for PR 17711 at commit 72d1ae1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76030 has finished for PR 17711 at commit 83242d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 22, 2017

ping

@maropu
Copy link
Member Author

maropu commented Apr 24, 2017

@hvanhovell ping

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76578 has finished for PR 17711 at commit f984c6b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76580 has finished for PR 17711 at commit 96293f3.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 8, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76597 has finished for PR 17711 at commit 96293f3.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 9, 2017

This failure seems unrelated to this pr? (other prs also hit the same R test failure...).

@maropu
Copy link
Member Author

maropu commented May 9, 2017

Jenkins, retest this please.

@maropu
Copy link
Member Author

maropu commented May 9, 2017

The R test failure seemed to be fixed in 2abfee1

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76606 has finished for PR 17711 at commit 96293f3.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* Collapse nested [[Concat]] expressions.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to org.apache.spark.sql.catalyst.optimizer.expressions.scala

CollapseRepartition,
CollapseProject,
CollapseWindow,
CollapseConcat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of Operator combine. Maybe move it to the spot around SimplifyCasts

})
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan transformAllExpressions ?

* Collapse nested [[Concat]] expressions.
*/
object CollapseConcat extends Rule[LogicalPlan] {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tail recursion? or using queue/stack?

@gatorsmile
Copy link
Member

Please follow the other optimizer rules. We need to add a optimizer test suite. For example, SimplifyConditionalSuite

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76795 has finished for PR 17711 at commit 96db575.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

select 5 % 3;
select pmod(-7, 3);

-- check operator precedence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the precedence rules we follow in the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@gatorsmile
Copy link
Member

LGTM pending minor comment.

We need to add an extra optimizer rule to combine the adjacent concatenate expressions. Thanks!

@maropu
Copy link
Member Author

maropu commented May 12, 2017

I quickly brushed up the Optimizer code based on your advice:
Using Stack:
a17d933#diff-a1acb054bc8888376603ef510e6d0ee0R551

Using tailrec:
master...maropu:SPARK-19951-3#diff-a1acb054bc8888376603ef510e6d0ee0R552

I checked the spark style-guide and I probably think we'd better to use more readable one. So, tailrec is better? I'll submit the tailrec one after this merged.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76840 has finished for PR 17711 at commit 089db30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link could be ineffective in the future. Could you also copy the table contents here? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@gatorsmile
Copy link
Member

@maropu The solution using tailrec looks more straightforward. Could you submit the PR based on that? Thanks!

@rxin
Copy link
Contributor

rxin commented May 12, 2017

I feel both are pretty complicated. Can we just do something similar to CombineUnion:

/**
 * Combines all adjacent [[Union]] operators into a single [[Union]].
 */
object CombineUnions extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    case u: Union => flattenUnion(u, false)
    case Distinct(u: Union) => Distinct(flattenUnion(u, true))
  }

  private def flattenUnion(union: Union, flattenDistinct: Boolean): Union = {
    val stack = mutable.Stack[LogicalPlan](union)
    val flattened = mutable.ArrayBuffer.empty[LogicalPlan]
    while (stack.nonEmpty) {
      stack.pop() match {
        case Distinct(Union(children)) if flattenDistinct =>
          stack.pushAll(children.reverse)
        case Union(children) =>
          stack.pushAll(children.reverse)
        case child =>
          flattened += child
      }
    }
    Union(flattened)
  }
}

It's going to be simpler because you don't need to handle distinct here.

@maropu
Copy link
Member Author

maropu commented May 12, 2017

@rxin ok, thank for the suggestion!

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76850 has started for PR 17711 at commit de89791.

@maropu
Copy link
Member Author

maropu commented May 12, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76856 has finished for PR 17711 at commit de89791.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in b526f70 May 12, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request May 15, 2017
## What changes were proposed in this pull request?
This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions.

## How was this patch tested?
Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17970 from maropu/SPARK-20730.
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
## What changes were proposed in this pull request?
This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL.

## How was this patch tested?
Added tests in `SparkSqlParserSuite`

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17711 from maropu/SPARK-19951.
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
## What changes were proposed in this pull request?
This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions.

## How was this patch tested?
Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17970 from maropu/SPARK-20730.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL.

## How was this patch tested?
Added tests in `SparkSqlParserSuite`

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17711 from maropu/SPARK-19951.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in apache#17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions.

## How was this patch tested?
Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17970 from maropu/SPARK-20730.
hejian991 pushed a commit to growingio/spark that referenced this pull request Dec 6, 2018
This pr added code to support `||` for string concatenation. This string operation is supported in PostgreSQL and MySQL.

Added tests in `SparkSqlParserSuite`

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17711 from maropu/SPARK-19951.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants