-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13135][SQL] Don't print expressions recursively in generated code #13192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @sameeragarwal / @davies |
|
Test build #58855 has finished for PR 13192 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't remove the empty lines here, or LINENO of the compiled code will be different than the formatted code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @davies .
Oh, I thought CodeFormatter.format is called before Janino and Guava loading cache, too.
I'll make that consistent in this afternoon. If then, it'll be okay.
|
As @rxin told, what was really needed is removing |
|
Test build #58960 has finished for PR 13192 at commit
|
|
Retest this please |
|
Test build #58971 has finished for PR 13192 at commit
|
|
The PySpark failure is fixed as a HOTFIX. |
|
Test build #59005 has finished for PR 13192 at commit
|
|
Hi, @davies . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #12979 is merged, this may not work now.
|
@dongjoon-hyun Maybe we could have a method Expression.genCodeWithComment() that is used by generated projections and operators, Expression.genCode() called by other Expressions will not have comment in it. This requires change more places, not sure it's a good idea or not. |
|
Ya. There were huge changes. I've saw the PR before, but I didn't consider that in this PR. |
|
Test build #59039 has finished for PR 13192 at commit
|
| val line = l.trim() | ||
| val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") && | ||
| line.startsWith("/*") && line.endsWith("*/") && | ||
| map(lastLine).substring(3).contains(map(line).substring(3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you check that this actually work? I think we have placeholders here so will not find any duplicated comments to skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it should work, I missed the map. Will it have performance issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay for the performance.
- This function is used for at every
CodeAndCommentcreation once. - It scans
codeAndComment.bodyonce. - Map lookup occurs on each line at most twice. Also, it does not cost much in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the skip condition is checking only consecutive comments lines.
If there is something to do more, please let me know, @davies .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
LGTM, |
…code ## What changes were proposed in this pull request? This PR is an up-to-date and a little bit improved version of #11019 of rxin for - (1) preventing recursive printing of expressions in generated code. Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation. - (2) Improve multiline comment indentation. - (3) Reduce the number of empty lines (mainly consecutive empty lines). - (4) Remove all space characters on empty lines. **Example** ```scala spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6) ``` **Before** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ /* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 078 */ /* 079 */ // initialize Range ... /* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 093 */ /* 094 */ // CONSUME: WholeStageCodegen /* 095 */ /* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3) /* 097 */ // ((input[0, bigint, false] + 1) + 2) /* 098 */ // (input[0, bigint, false] + 1) ... /* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6) /* 108 */ // ((input[0, bigint, false] + 4) + 5) /* 109 */ // (input[0, bigint, false] + 4) ... /* 126 */ } ``` **After** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 077 */ // initialize Range ... /* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 091 */ // CONSUME: WholeStageCodegen /* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3) ... /* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6) ... /* 118 */ } ``` ## How was this patch tested? Pass the Jenkins tests and see the result of the following command manually. ```scala scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen() ``` Author: Dongjoon Hyun <dongjoonapache.org> Author: Reynold Xin <rxindatabricks.com> Author: Dongjoon Hyun <[email protected]> Closes #13192 from dongjoon-hyun/SPARK-13135.
|
Thank you, @davies ! |
| var lastLine: String = "dummy" | ||
| codeAndComment.body.split('\n').foreach { l => | ||
| val line = l.trim() | ||
| val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we assuming the comment holder will always take an entire line?
What changes were proposed in this pull request?
This PR is an up-to-date and a little bit improved version of #11019 of @rxin for
Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation.
Example
Before
After
How was this patch tested?
Pass the Jenkins tests and see the result of the following command manually.
Author: Dongjoon Hyun [email protected]
Author: Reynold Xin [email protected]