Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a follow-up of #18955 , to fix a bug that we break whole stage codegen for Limit.

How was this patch tested?

existing tests.

// Do not enable whole stage codegen for a single limit.
override def supportCodegen: Boolean = child match {
case plan: CodegenSupport => plan.supportCodegen
case _ => false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, we may have more operators above Limit, so it's not a single Limit.

override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case logical.ReturnAnswer(rootPlan) => rootPlan match {
case logical.Limit(IntegerLiteral(limit), logical.Sort(order, true, child)) =>
execution.TakeOrderedAndProjectExec(limit, order, child.output, planLater(child)) :: Nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda unrelated, remove these logical and execution prefix to shorten the code.

// query plan are consumed. It's possible that `CollectLimitExec` only consumes a little
// data from child plan and finishes the query without releasing resources. Here we wrap
// the child plan with `LocalLimitExec`, to stop the processing of whole stage codegen and
// trigger the resource releasing work, after we consume `limit` rows.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments updated.

projection.initialize(0)
LocalRelation(projectList.map(_.toAttribute), data.map(projection))

case Limit(IntegerLiteral(limit), LocalRelation(output, data)) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to fix SQLQuerySuite.SPARK-19650: An action on a Command should not trigger a Spark job, limit over local relation should not trigger a spark job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda violates the idea that we shouldn't rely on optimization for correctness, but I suppose this is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this is not about correctness, An action on a Command should not trigger a Spark job is also kind of optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right about that.

@cloud-fan
Copy link
Contributor Author

cc @hvanhovell @gatorsmile

@hvanhovell
Copy link
Contributor

LGTM pending jenkins.

@SparkQA
Copy link

SparkQA commented Aug 18, 2017

Test build #80843 has finished for PR 18993 at commit b6d51de.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Aug 18, 2017

retest this please.

@viirya
Copy link
Member

viirya commented Aug 18, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Aug 18, 2017

Test build #80848 has finished for PR 18993 at commit b6d51de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merging to master.

@asfgit asfgit closed this in 7880909 Aug 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants