Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca874652437634R529, we may split field expressions codes in CreateExternalRow to support wide table. However, the whole stage codegen framework doesn't support it, because the input for expressions is not always the input row, but can be CodeGenContext.currentVars, which doesn't work well with CodeGenContext.splitExpressions.

Actually we do have a check to guard against this cases, but it's incomplete, it only checks output fields.

This PR improves the whole stage codegen support check, to disable it if there are too many input fields, so that we can avoid splitting field expressions codes in CreateExternalRow for whole stage codegen.

TODO: Is it a better solution if we can make CodeGenContext.currentVars work well with CodeGenContext.splitExpressions?

How was this patch tested?

new test in DatasetSuite.

@cloud-fan
Copy link
Contributor Author

cc @davies @marmbrus @yhuai

@cloud-fan cloud-fan changed the title [SPARK-14554][SQL] Dataset.map may generate wrong java code for wide table [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns Apr 12, 2016
@davies
Copy link
Contributor

davies commented Apr 12, 2016

LGTM, could you update the description?

@cloud-fan
Copy link
Contributor Author

ok updated.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55584 has finished for PR 12322 at commit 0ad1194.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55586 has finished for PR 12322 at commit 4d63e15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Apr 12, 2016

Thanks. Merging to master.

@asfgit asfgit closed this in 52a8011 Apr 12, 2016
test("SPARK-14554: Dataset.map may generate wrong java code for wide table") {
val wideDF = sqlContext.range(10).select(Seq.tabulate(1000) {i => ('id + i).as(s"c$i")} : _*)
// Make sure the generated code for this plan can compile and execute.
wideDF.map(_.getLong(0)).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still use checkAnswer here because it provides extra debugging info when there is an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this test case is super slow? It took more than 5 minutes to finish it. Is this expected?

- SPARK-14554: Dataset.map may generate wrong java code for wide table (5 minutes, 20 seconds)

See the link: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59079/consoleFull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fixed in #13273

asfgit pushed a commit that referenced this pull request Apr 13, 2016
## What changes were proposed in this pull request?

address this comment: #12322 (comment)

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #12346 from cloud-fan/tmp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants