[SPARK-20392][SQL] Set barrier to prevent re-entering a tree #19873

viirya · 2017-12-04T07:18:31Z

What changes were proposed in this pull request?

The SQL Analyzer goes through a whole query plan even most part of it is analyzed. This increases the time spent on query analysis for long pipelines in ML, especially.

This patch adds a logical node called AnalysisBarrier that wraps an analyzed logical plan to prevent it from analysis again. The barrier is applied to the analyzed logical plan in Dataset. It won't change the output of wrapped logical plan and just acts as a wrapper to hide it from analyzer. New operations on the dataset will be put on the barrier, so only the new nodes created will be analyzed.

This analysis barrier will be removed at the end of analysis stage.

How was this patch tested?

Added tests.

viirya · 2017-12-04T07:22:17Z

cc @cloud-fan @hvanhovell Basically this is the same changes in #17770.

SparkQA · 2017-12-04T08:05:02Z

Test build #84417 has finished for PR 19873 at commit 136fd30.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AnalysisBarrier(child: LogicalPlan) extends LeafNode

viirya · 2017-12-04T08:06:22Z

retest this please.

SparkQA · 2017-12-04T10:24:42Z

Test build #84420 has finished for PR 19873 at commit 136fd30.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AnalysisBarrier(child: LogicalPlan) extends LeafNode

SparkQA · 2017-12-04T17:15:22Z

Test build #84430 has finished for PR 19873 at commit 9f5a0e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AnalysisBarrier(child: LogicalPlan) extends LeafNode

cloud-fan · 2017-12-05T06:19:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

-   *
-   * @param rule the function use to transform this nodes children
-   */
-  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = {


can we also remove the analyzed flag in this class?

cloud-fan · 2017-12-05T06:23:50Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

    val doubleRepartitioned = testData.repartition(10).repartition(20).coalesce(5)
    def countRepartitions(plan: LogicalPlan): Int = plan.collect { case r: Repartition => r }.length
-    assert(countRepartitions(doubleRepartitioned.queryExecution.logical) === 3)
+    assert(countRepartitions(doubleRepartitioned.queryExecution.analyzed) === 3)


is it a necessary change?

Please see previous discussion: https://github.com/apache/spark/pull/17770/files#r118480364

cloud-fan · 2017-12-05T06:24:08Z

LGTM, also cc @gatorsmile

gatorsmile · 2017-12-05T06:39:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala


-    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
      case p if p.analyzed => p


Sorry, what do you mean why?

In which cases, we should still use the analyzed flag?

gatorsmile · 2017-12-05T06:41:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

     * for all conflicting attributes.
     */
-    private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = {
+    private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = {


What is oriRight ?

Use originalRight

gatorsmile · 2017-12-05T06:43:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

    }
    // For partitioned relation r, r.schema's column ordering can be different from the column
-    // ordering of data.logicalPlan (partition columns are all moved after data column).  This
+    // ordering of data.logicalPlan (partition columns are all moved after data column). This


Get rid of changes in this file.

gatorsmile · 2017-12-05T06:50:20Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

      case sa @ Sort(_, _, child: Aggregate) => sa

-      case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
+      case s @ Sort(order, _, oriChild) if !s.resolved && oriChild.resolved =>


Use originalChild

gatorsmile · 2017-12-05T06:50:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

        }

-      case f @ Filter(cond, child) if !f.resolved && child.resolved =>
+      case f @ Filter(cond, oriChild) if !f.resolved && oriChild.resolved =>


Use originalChild

gatorsmile · 2017-12-05T07:09:48Z

From the PR description, I am unable to tell the changes made in this PR. We need a better description to explain what is the solution proposed in this PR.

Also explains which cases need a special handling and the reason.

SparkQA · 2017-12-05T12:52:54Z

Test build #84475 has finished for PR 19873 at commit bae034d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-05T13:05:23Z

Test build #84477 has finished for PR 19873 at commit 54182bf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-05T20:18:43Z

@viirya Could you resolve the conflicts?

gatorsmile · 2017-12-05T20:19:52Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

 }
+
+/** A logical plan for setting a barrier of analysis */
+case class AnalysisBarrier(child: LogicalPlan) extends LeafNode {


Put the PR descriptions to the comment of this class?

gatorsmile · 2017-12-05T20:21:00Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala

 /**
- * This suite is used to test [[LogicalPlan]]'s `resolveOperators` and make sure it can correctly
- * skips sub-trees that have already been marked as analyzed.
+ * This suite is used to test [[LogicalPlan]]'s `transformUp` plus analysis barrier and make sure


Since both transformUp and transformDown work, create a test case using transformDown. Also update the comments here.

viirya · 2017-12-06T02:13:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

-      case Kurtosis(e @ StringType()) => Kurtosis(Cast(e, DoubleType))
-    }
+    override protected def coerceTypes(plan: LogicalPlan): LogicalPlan =
+      plan transformAllExpressions {


For indentation...

SparkQA · 2017-12-06T05:17:59Z

Test build #84518 has finished for PR 19873 at commit 4775a02.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ReqAndHandler(req: Request, handler: MemberHandler)
trait TypeCoercionRule extends Rule[LogicalPlan] with Logging

SparkQA · 2017-12-06T05:26:44Z

Test build #84520 has finished for PR 19873 at commit d2375e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-06T05:43:55Z

LGTM

gatorsmile · 2017-12-06T05:44:01Z

Thanks! Merged to master.

viirya · 2017-12-06T06:30:13Z

Thanks! @gatorsmile @cloud-fan

viirya force-pushed the SPARK-20392-reopen branch from 136fd30 to 9f5a0e4 Compare December 4, 2017 14:16

Add analysis barrier around analyzed plans.

9f5a0e4

cloud-fan reviewed Dec 5, 2017

View reviewed changes

gatorsmile reviewed Dec 5, 2017

View reviewed changes

viirya force-pushed the SPARK-20392-reopen branch from bae034d to 54182bf Compare December 5, 2017 10:02

Remove analyzed stuff.

54182bf

gatorsmile reviewed Dec 5, 2017

View reviewed changes

Modify comment and test cases.

b7747c4

viirya commented Dec 6, 2017

View reviewed changes

viirya added 2 commits December 6, 2017 02:15

Merge remote-tracking branch 'upstream/master' into SPARK-20392-reopen

4775a02

Less change for indentation.

d2375e0

asfgit closed this in 00d176d Dec 6, 2017

HyukjinKwon mentioned this pull request Mar 22, 2023

[SPARK-42896][SQL][PYTHON] Make mapInPandas / mapInArrow support barrier mode execution #40520

Closed

viirya deleted the SPARK-20392-reopen branch December 27, 2023 18:35

[SPARK-20392][SQL] Set barrier to prevent re-entering a tree #19873

[SPARK-20392][SQL] Set barrier to prevent re-entering a tree #19873

Uh oh!

Conversation

viirya commented Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

viirya commented Dec 4, 2017

Uh oh!

SparkQA commented Dec 4, 2017

Uh oh!

viirya commented Dec 4, 2017

Uh oh!

SparkQA commented Dec 4, 2017

Uh oh!

SparkQA commented Dec 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Dec 5, 2017

Uh oh!

SparkQA commented Dec 5, 2017

Uh oh!

gatorsmile commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

gatorsmile commented Dec 6, 2017

Uh oh!

gatorsmile commented Dec 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

viirya commented Dec 4, 2017 •

edited

Loading

viirya Dec 5, 2017 •

edited

Loading

gatorsmile commented Dec 5, 2017 •

edited

Loading

gatorsmile commented Dec 6, 2017 •

edited

Loading