[SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys #21529

mgaido91 · 2018-06-11T18:07:30Z

What changes were proposed in this pull request?

EnsureRequirement in its reorder method currently assumes that the same key appears only once in the join condition. This of course might not be the case, and when it is not satisfied, it returns a wrong plan which produces a wrong result of the query.

How was this patch tested?

added UT

…ng equal keys

cloud-fan · 2018-06-11T18:57:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

      currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) = {
    val leftKeysBuffer = ArrayBuffer[Expression]()
    val rightKeysBuffer = ArrayBuffer[Expression]()
+    val alreadyUsedIndexes = mutable.Set[Int]()


nit: maybe pickedIndexes?

cloud-fan · 2018-06-11T19:01:33Z

good catch! thanks!

SparkQA · 2018-06-11T21:59:14Z

Test build #91666 has finished for PR 21529 at commit 06858cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-06-11T22:22:37Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    withSQLConf(("spark.sql.shuffle.partitions", "1"),
+      ("spark.sql.constraintPropagation.enabled", "false"),
+      ("spark.sql.autoBroadcastJoinThreshold", "-1")) {
+      val df1 = spark.range(100)


we should make sure range has more than one partitions, otherwise we can't reproduce the bug.

cloud-fan · 2018-06-11T22:52:03Z

I was surprised that this bug is present even if the table is not bucketed, and then I found out another problem in the code: reorderJoinPredicates transform the plan again. reorderJoinPredicates is called during plan transformation and we should not double transform the plan.

This makes the bug has a much larger impact: when we do a shuffle join, we add shuffle exchange to both of the join sides, which then triggers reorderJoinPredicates because of the double transformation problem.

@mgaido91 let's also remove the transformUp in reorderJoinPredicates

also cc @tejasapatil

cloud-fan · 2018-06-11T23:00:05Z

one followup: we should improve our golden file sql test, to run same queries with different configs. This is pretty important for the join tests, otherwise we only test broadcast join.

mgaido91 · 2018-06-12T11:03:12Z

thanks for your review @cloud-fan. Nice catch on the transformUp! I addressed all your comments.

As far as the followup is regarded, we should decide if we want to support the possibility that with different configs we get different results or not. In the second case we have more flexibility, but we have also to be much more careful in order not to introduce bugs when different configs should not produce different results. Moreover, we would also have many more golden files...
What do you think?

SparkQA · 2018-06-12T12:58:31Z

Test build #91705 has finished for PR 21529 at commit 341f1b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-06-12T13:12:46Z

retest this please

SparkQA · 2018-06-12T15:11:11Z

Test build #91710 has finished for PR 21529 at commit 341f1b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-06-12T17:45:20Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

  }
+
+  test("SPARK-24495: EnsureRequirements can return wrong plan when reusing the same key in join") {
+    withSQLConf(("spark.sql.shuffle.partitions", "1"),


let's not use hard coded config name, use SQLConf.SHUFFLE_PARTITIONS instead.

a nit: we usually use a -> b, c -> d, ... to specify config pairs.

cloud-fan · 2018-06-12T17:45:48Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    withSQLConf(("spark.sql.shuffle.partitions", "1"),
+      ("spark.sql.constraintPropagation.enabled", "false"),
+      ("spark.sql.autoBroadcastJoinThreshold", "-1")) {
+      val df1 = spark.range(100).repartition(2, $"id", $"id")


a simpler way spark.range(0, 100, 1, numPartitions = 2)

this way would not be ok, as we would have a RangePartitioning while the issue appears only with HashPartitioning

no, the issue can happen with range partition(because of the double transformation issue), the code in the ticket can reproduce the bug and it has no hash partitioning.

yes, but if we remove the transformUp as you correctly suggested, if we do not use HashPartitioning we do not test the proper behavior of the reorder method.

Anyway, I added another test to PlannerSuite which checks the behavior of the reorder method, so I will follow your suggestion here, thanks.

cloud-fan · 2018-06-12T17:46:37Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+      ("spark.sql.constraintPropagation.enabled", "false"),
+      ("spark.sql.autoBroadcastJoinThreshold", "-1")) {
+      val df1 = spark.range(100).repartition(2, $"id", $"id")
+      val df2 = spark.range(100).select(($"id" * 2).as("b1"), (- $"id").as("b2"))


($"id" * 2).as("b1") -> $"id".as("b1"), to minimize the test.

cloud-fan · 2018-06-12T17:48:47Z

I think we need to support both. testing different physical operators needs same result, testing something like type coercion mode needs different result. Anyway let's discuss it in the followup.

gatorsmile · 2018-06-12T18:39:52Z

@mgaido91 Could you help improve the test coverage of joins org.apache.spark.sql.JoinSuite? Due to the incomplete test case coverage, we did not discover this at the very beginning. We need to add more test cases to cover different join algorithms.

mgaido91 · 2018-06-12T19:35:02Z

I think we need to support both. testing different physical operators needs same result, testing something like type coercion mode needs different result. Anyway let's discuss it in the followup.

ok @cloud-fan, I'll try and send a proposal in the next days.

@mgaido91 Could you help improve the test coverage of joins org.apache.spark.sql.JoinSuite? Due to the incomplete test case coverage, we did not discover this at the very beginning. We need to add more test cases to cover different join algorithms.

Sure, @gatorsmile, I am happy to. Do you mean running the existing tests for every type of join or do you have something different in mind? Thanks.

cloud-fan · 2018-06-13T03:56:06Z

I think for this PR, apart from the end-to-end test for checking result, we should also have a unit test in PlannerSuite to cover this case for EnsureRequirements. In the followup, we can add the config support for the golden file tests and add more test cases.

cloud-fan · 2018-06-13T14:17:43Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    outputPlan match {
+      case SortMergeJoinExec(leftKeys, rightKeys, _, _, _, _) =>
+        assert(leftKeys == Seq(exprA, exprA))
+        assert(rightKeys.contains(exprB) && rightKeys.contains(exprC))


is it better to check rightKeys == Seq(exprB, exprC)?

cloud-fan · 2018-06-13T14:19:36Z

sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala

+      SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      val df1 = spark.range(0, 100, 1, 2)
+      val df2 = spark.range(100).select($"id".as("b1"), (- $"id").as("b2"))
+      val res = df1.join(df2, $"id" === $"b1" && $"id" === $"b2")


one difference between this test and the code in JIRA ticket is, the code in JIRA ticket has a Project above join, to trigger the double transformation issue. We should add a Project and make sure this test does fail without this patch.

cloud-fan · 2018-06-13T14:20:27Z

sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala

    }
  }
+
+  test("SPARK-24495: EnsureRequirements can return wrong plan when reusing the same key in join") {


as end-to-end test, maybe a better name is: SPARK-24495: Join may return wrong result when having duplicated equal-join keys

cloud-fan · 2018-06-13T14:21:03Z

LGTM except some minor comments about test

SparkQA · 2018-06-13T17:28:39Z

Test build #91773 has finished for PR 21529 at commit 40abcff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

cloud-fan · 2018-06-13T21:17:17Z

@mgaido91 can you fix the conflicts? thanks!

SparkQA · 2018-06-13T22:14:50Z

Test build #91790 has finished for PR 21529 at commit 6553c27.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

gatorsmile · 2018-06-13T22:17:14Z

retest this please

SparkQA · 2018-06-14T02:08:36Z

Test build #91797 has finished for PR 21529 at commit 6553c27.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T11:43:12Z

Test build #91826 has finished for PR 21529 at commit 6ef4f0d.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-06-14T12:21:44Z

retest this please

SparkQA · 2018-06-14T16:09:12Z

Test build #91843 has finished for PR 21529 at commit 6ef4f0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-06-14T16:20:23Z

Thanks! Merged to master/2.3

…ng equal keys `EnsureRequirement` in its `reorder` method currently assumes that the same key appears only once in the join condition. This of course might not be the case, and when it is not satisfied, it returns a wrong plan which produces a wrong result of the query. added UT Author: Marco Gaido <[email protected]> Closes #21529 from mgaido91/SPARK-24495. (cherry picked from commit fdadc4b) Signed-off-by: Xiao Li <[email protected]>

mgaido91 · 2018-06-14T16:36:10Z

Thanks @gatorsmile. Sorry, may I ask what you think about #21529 (comment)? Thanks.

gatorsmile · 2018-06-14T17:09:34Z

Adding new queries to SQLQueryTestSuite is the best way to do it in the current infrastructure. Do your best to cover all the join algorithms for different input data and join types?

…ng equal keys `EnsureRequirement` in its `reorder` method currently assumes that the same key appears only once in the join condition. This of course might not be the case, and when it is not satisfied, it returns a wrong plan which produces a wrong result of the query. added UT Author: Marco Gaido <[email protected]> Closes apache#21529 from mgaido91/SPARK-24495. (cherry picked from commit fdadc4b) Ref: LIHADOOP-40100 RB=1410545 BUG=LIHADOOP-40100 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=edlu

[SPARK-24495][SQL] EnsureRequirement returns worng plan when reorderi…

06858cd

…ng equal keys

cloud-fan reviewed Jun 11, 2018

View reviewed changes

address comments

341f1b2

cloud-fan reviewed Jun 12, 2018

View reviewed changes

address comments

40abcff

cloud-fan reviewed Jun 13, 2018

View reviewed changes

address comments

6553c27

gatorsmile reviewed Jun 13, 2018

View reviewed changes

Merge branch 'master' of github.com:apache/spark into SPARK-24495

6ef4f0d

asfgit closed this in fdadc4b Jun 14, 2018

mgaido91 mentioned this pull request Jun 16, 2018

[SPARK-24562][TESTS] Support different configs for same test in SQLQueryTestSuite #21568

Closed

[SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys #21529

[SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys #21529

Uh oh!

Conversation

mgaido91 commented Jun 11, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 11, 2018

Uh oh!

SparkQA commented Jun 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jun 11, 2018

Uh oh!

mgaido91 commented Jun 12, 2018

Uh oh!

SparkQA commented Jun 12, 2018

Uh oh!

mgaido91 commented Jun 12, 2018

Uh oh!

SparkQA commented Jun 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 12, 2018

Uh oh!

gatorsmile commented Jun 12, 2018

Uh oh!

mgaido91 commented Jun 12, 2018

Uh oh!

cloud-fan commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 13, 2018

Uh oh!

SparkQA commented Jun 13, 2018

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 13, 2018

Uh oh!

SparkQA commented Jun 13, 2018

Uh oh!

gatorsmile commented Jun 13, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

mgaido91 commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

cloud-fan commented Jun 11, 2018 •

edited

Loading

mgaido91 Jun 13, 2018 •

edited

Loading

cloud-fan commented Jun 13, 2018 •

edited

Loading

cloud-fan Jun 13, 2018 •

edited

Loading