-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21743][SQL][follow-up] top-most limit should not cause memory leak #18993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Do not enable whole stage codegen for a single limit. | ||
| override def supportCodegen: Boolean = child match { | ||
| case plan: CodegenSupport => plan.supportCodegen | ||
| case _ => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, we may have more operators above Limit, so it's not a single Limit.
| override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { | ||
| case logical.ReturnAnswer(rootPlan) => rootPlan match { | ||
| case logical.Limit(IntegerLiteral(limit), logical.Sort(order, true, child)) => | ||
| execution.TakeOrderedAndProjectExec(limit, order, child.output, planLater(child)) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda unrelated, remove these logical and execution prefix to shorten the code.
| // query plan are consumed. It's possible that `CollectLimitExec` only consumes a little | ||
| // data from child plan and finishes the query without releasing resources. Here we wrap | ||
| // the child plan with `LocalLimitExec`, to stop the processing of whole stage codegen and | ||
| // trigger the resource releasing work, after we consume `limit` rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments updated.
| projection.initialize(0) | ||
| LocalRelation(projectList.map(_.toAttribute), data.map(projection)) | ||
|
|
||
| case Limit(IntegerLiteral(limit), LocalRelation(output, data)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to fix SQLQuerySuite.SPARK-19650: An action on a Command should not trigger a Spark job, limit over local relation should not trigger a spark job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda violates the idea that we shouldn't rely on optimization for correctness, but I suppose this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this is not about correctness, An action on a Command should not trigger a Spark job is also kind of optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right about that.
|
LGTM pending jenkins. |
|
Test build #80843 has finished for PR 18993 at commit
|
|
retest this please. |
|
LGTM |
|
Test build #80848 has finished for PR 18993 at commit
|
|
LGTM Thanks! Merging to master. |
What changes were proposed in this pull request?
This is a follow-up of #18955 , to fix a bug that we break whole stage codegen for
Limit.How was this patch tested?
existing tests.