[SPARK-21743][SQL][follow-up] top-most limit should not cause memory leak #18993

cloud-fan · 2017-08-18T11:55:11Z

What changes were proposed in this pull request?

This is a follow-up of #18955 , to fix a bug that we break whole stage codegen for Limit.

How was this patch tested?

existing tests.

cloud-fan · 2017-08-18T11:56:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala

-  // Do not enable whole stage codegen for a single limit.
-  override def supportCodegen: Boolean = child match {
-    case plan: CodegenSupport => plan.supportCodegen
-    case _ => false


This is wrong, we may have more operators above Limit, so it's not a single Limit.

cloud-fan · 2017-08-18T11:58:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

    override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-      case logical.ReturnAnswer(rootPlan) => rootPlan match {
-        case logical.Limit(IntegerLiteral(limit), logical.Sort(order, true, child)) =>
-          execution.TakeOrderedAndProjectExec(limit, order, child.output, planLater(child)) :: Nil


kinda unrelated, remove these logical and execution prefix to shorten the code.

cloud-fan · 2017-08-18T11:58:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

+          // query plan are consumed. It's possible that `CollectLimitExec` only consumes a little
+          // data from child plan and finishes the query without releasing resources. Here we wrap
+          // the child plan with `LocalLimitExec`, to stop the processing of whole stage codegen and
+          // trigger the resource releasing work, after we consume `limit` rows.


comments updated.

cloud-fan · 2017-08-18T11:59:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

      projection.initialize(0)
      LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+
+    case Limit(IntegerLiteral(limit), LocalRelation(output, data)) =>


This is to fix SQLQuerySuite.SPARK-19650: An action on a Command should not trigger a Spark job, limit over local relation should not trigger a spark job.

This kinda violates the idea that we shouldn't rely on optimization for correctness, but I suppose this is ok.

technically this is not about correctness, An action on a Command should not trigger a Spark job is also kind of optimization.

Yeah, you are right about that.

cloud-fan · 2017-08-18T11:59:57Z

cc @hvanhovell @gatorsmile

hvanhovell · 2017-08-18T14:29:58Z

LGTM pending jenkins.

SparkQA · 2017-08-18T14:44:15Z

Test build #80843 has finished for PR 18993 at commit b6d51de.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-08-18T15:02:24Z

retest this please.

viirya · 2017-08-18T15:02:55Z

LGTM

SparkQA · 2017-08-18T17:40:00Z

Test build #80848 has finished for PR 18993 at commit b6d51de.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-08-18T18:19:58Z

LGTM

Thanks! Merging to master.

do not break whole stage codegen for limit

b6d51de

cloud-fan commented Aug 18, 2017

View reviewed changes

asfgit closed this in 7880909 Aug 18, 2017

[SPARK-21743][SQL][follow-up] top-most limit should not cause memory leak #18993

[SPARK-21743][SQL][follow-up] top-most limit should not cause memory leak #18993

Uh oh!

Conversation

cloud-fan commented Aug 18, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell Aug 18, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 18, 2017

Uh oh!

hvanhovell commented Aug 18, 2017

Uh oh!

SparkQA commented Aug 18, 2017

Uh oh!

viirya commented Aug 18, 2017

Uh oh!

viirya commented Aug 18, 2017

Uh oh!

SparkQA commented Aug 18, 2017

Uh oh!

gatorsmile commented Aug 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants