-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns #12322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM, could you update the description? |
|
ok updated. |
|
Test build #55584 has finished for PR 12322 at commit
|
|
Test build #55586 has finished for PR 12322 at commit
|
|
Thanks. Merging to master. |
| test("SPARK-14554: Dataset.map may generate wrong java code for wide table") { | ||
| val wideDF = sqlContext.range(10).select(Seq.tabulate(1000) {i => ('id + i).as(s"c$i")} : _*) | ||
| // Make sure the generated code for this plan can compile and execute. | ||
| wideDF.map(_.getLong(0)).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still use checkAnswer here because it provides extra debugging info when there is an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this test case is super slow? It took more than 5 minutes to finish it. Is this expected?
- SPARK-14554: Dataset.map may generate wrong java code for wide table (5 minutes, 20 seconds)
See the link: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59079/consoleFull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fixed in #13273
## What changes were proposed in this pull request? address this comment: #12322 (comment) ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #12346 from cloud-fan/tmp.
What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca874652437634R529, we may split field expressions codes in
CreateExternalRowto support wide table. However, the whole stage codegen framework doesn't support it, because the input for expressions is not always the input row, but can beCodeGenContext.currentVars, which doesn't work well withCodeGenContext.splitExpressions.Actually we do have a check to guard against this cases, but it's incomplete, it only checks output fields.
This PR improves the whole stage codegen support check, to disable it if there are too many input fields, so that we can avoid splitting field expressions codes in
CreateExternalRowfor whole stage codegen.TODO: Is it a better solution if we can make
CodeGenContext.currentVarswork well withCodeGenContext.splitExpressions?How was this patch tested?
new test in DatasetSuite.