[SPARK-17702][SQL] Code generation including too many mutable states exceeds JVM size limit. #15275

ueshin · 2016-09-28T04:30:41Z

What changes were proposed in this pull request?

Code generation including too many mutable states exceeds JVM size limit to extract values from references into fields in the constructor.
We should split the generated extractions in the constructor into smaller functions.

How was this patch tested?

I added some tests to check if the generated codes for the expressions exceed or not.

SparkQA · 2016-09-28T05:16:22Z

Test build #66020 has finished for PR 15275 at commit 858a3ec.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…tableStates.

SparkQA · 2016-09-28T08:36:06Z

Test build #66025 has finished for PR 15275 at commit 80b9435.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-09-29T06:32:03Z

Can you add more inline comments explaining what's going on?

SparkQA · 2016-09-29T09:53:43Z

Test build #66090 has finished for PR 15275 at commit 3c4b765.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-09-30T00:34:27Z

LGTM. cc @davies to take a look too.

kiszk · 2016-10-01T01:46:27Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+  }
+
+  private def splitExpressions(
+      expressions: Seq[String], funcName: String, arguments: Seq[(String, String)]): String = {


How about exposing this API as public? This API looks more flexible and reusable.Is there any reason to declare as private?
(I replaced "non-public" with "public" in the first statement).

I didn't have a special reason.
Should we make it public? @rxin, @davies

As far as I know, there are two possible use cases of this API
https://issues.apache.org/jira/browse/SPARK-16223
https://github.com/apache/spark/pull/15219/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cL587 in #15219

SparkQA · 2016-10-03T20:22:46Z

Test build #66267 has finished for PR 15275 at commit 3c4b765.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-04T04:48:26Z

Merging in master. Thanks.

…exceeds JVM size limit. ## What changes were proposed in this pull request? Code generation including too many mutable states exceeds JVM size limit to extract values from `references` into fields in the constructor. We should split the generated extractions in the constructor into smaller functions. ## How was this patch tested? I added some tests to check if the generated codes for the expressions exceed or not. Author: Takuya UESHIN <[email protected]> Closes apache#15275 from ueshin/issues/SPARK-17702.

viirya · 2016-12-01T02:12:54Z

This affects many places in code generation. I think this should be a common issue for the users of previous versions. Should we make a backport to branch-2.0 and branch-1.6? cc @rxin @ueshin @kiszk

viirya · 2016-12-01T02:48:29Z

One important reason to prepare a backport for this is, as we deprecate the settings to turn off unsafe and codegen since 1.6, we can't turn them off to avoid JVM size limitation issue if users encounter that.

ueshin · 2016-12-01T03:24:48Z

It seems we can cherry-pick into branch-2.0 without conflicts, but can't into branch-1.6.
Please let me know if we need to backport to branch-1.6.

rezasafi · 2016-12-08T20:28:56Z

Sorry to bother, it seems that the backport to 1.6 is not clean and easy. Any update on the decision of whether this fix will be tried to be backported to 1.6? thanks.

xs2rajni · 2017-03-10T19:02:00Z

When will this fix be backported to spark 2.0?

ueshin added 2 commits September 28, 2016 13:19

Add a test to reproduce the issue.

4c78ca9

Split wide constructor into blocks due to JVM code size limit.

858a3ec

ueshin added 3 commits September 28, 2016 14:58

Modify added function to use field to initialize mutable states.

a83df5c

Add \n to each initialization code for readability of generated code.

0241df7

Revert some modifications and move declareAddedFunctions after initMu…

80b9435

…tableStates.

Add inline comment.

3c4b765

kiszk reviewed Oct 1, 2016

View reviewed changes

asfgit closed this in b1b4727 Oct 4, 2016

[SPARK-17702][SQL] Code generation including too many mutable states exceeds JVM size limit. #15275

[SPARK-17702][SQL] Code generation including too many mutable states exceeds JVM size limit. #15275

Uh oh!

Conversation

ueshin commented Sep 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 28, 2016

Uh oh!

SparkQA commented Sep 28, 2016

Uh oh!

rxin commented Sep 29, 2016

Uh oh!

SparkQA commented Sep 29, 2016

Uh oh!

rxin commented Sep 30, 2016

Uh oh!

kiszk Oct 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ueshin Oct 1, 2016

Choose a reason for hiding this comment

Uh oh!

kiszk Oct 2, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2016

Uh oh!

rxin commented Oct 4, 2016

Uh oh!

viirya commented Dec 1, 2016

Uh oh!

viirya commented Dec 1, 2016

Uh oh!

ueshin commented Dec 1, 2016

Uh oh!

rezasafi commented Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xs2rajni commented Mar 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kiszk Oct 1, 2016 •

edited

Loading

rezasafi commented Dec 8, 2016 •

edited

Loading