-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13135][SQL] Don't print expressions recursively in generated code #11019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
query: sqlContext.range(1, 1000).select('id + 1 + 1).show()old generated code: /* 068 */ while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */ long Range_value4 = Range_number2;
/* 070 */ Range_number2 += 1L;
/* 071 */ if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */ Range_overflow3 = true;
/* 073 */ }
/* 074 */
/* 075 */ /* ((input[0, bigint] + 1) + 1) */
/* 076 */ /* (input[0, bigint] + 1) */
/* 077 */ /* input[0, bigint] */
/* 078 */
/* 079 */ /* 1 */
/* 080 */
/* 081 */ long Project_value8 = -1L;
/* 082 */ Project_value8 = Range_value4 + 1L;
/* 083 */ /* 1 */
/* 084 */
/* 085 */ long Project_value6 = -1L;
/* 086 */ Project_value6 = Project_value8 + 1L;
/* 087 */
/* 088 */
/* 089 */ /* input[0, bigint] */
/* 090 */
/* 091 */ Project_rowWriter19.write(0, Project_value6);
/* 092 */ currentRow = Project_result17;
/* 093 */ return;
/* 094 */
/* 095 */
/* 096 */ }new generated code /* 068 */ while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */ long Range_value4 = Range_number2;
/* 070 */ Range_number2 += 1L;
/* 071 */ if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */ Range_overflow3 = true;
/* 073 */ }
/* 074 */
/* 075 */ // project list: [((input[0, bigint] + 1) + 1)]
/* 076 */
/* 077 */
/* 078 */
/* 079 */
/* 080 */ long Project_value8 = -1L;
/* 081 */ Project_value8 = Range_value4 + 1L;
/* 082 */
/* 083 */
/* 084 */ long Project_value6 = -1L;
/* 085 */ Project_value6 = Project_value8 + 1L;
/* 086 */
/* 087 */
/* 088 */
/* 089 */ // project list: [input[0, bigint]]
/* 090 */
/* 091 */
/* 092 */ Project_rowWriter19.write(0, Project_value6);
/* 093 */ currentRow = Project_result17;
/* 094 */ return;
/* 095 */
/* 096 */
/* 097 */ } |
|
Let me know if you guys think this is more clear. Alternatively, we can also print the expression comment only for top level expressions. |
|
Test build #50551 has finished for PR 11019 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that value could be a constant too
|
retest this please |
|
LGTM, it will be good if we can also remove the continuous blank lines... |
|
Test build #2489 has finished for PR 11019 at commit
|
|
Test build #50554 has finished for PR 11019 at commit
|
|
retest this please |
|
Test build #50570 has finished for PR 11019 at commit
|
|
I like this output much better. |
|
Test build #2492 has finished for PR 11019 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use /* ...... */ in case the comments cross multiple lines
|
Test build #2494 has finished for PR 11019 at commit
|
…code ## What changes were proposed in this pull request? This PR is an up-to-date and a little bit improved version of apache#11019 of rxin for - (1) preventing recursive printing of expressions in generated code. Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to apache#11019, this PR improves the followings in code generation. - (2) Improve multiline comment indentation. - (3) Reduce the number of empty lines (mainly consecutive empty lines). - (4) Remove all space characters on empty lines. **Example** ```scala spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6) ``` **Before** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ /* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 078 */ /* 079 */ // initialize Range ... /* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 093 */ /* 094 */ // CONSUME: WholeStageCodegen /* 095 */ /* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3) /* 097 */ // ((input[0, bigint, false] + 1) + 2) /* 098 */ // (input[0, bigint, false] + 1) ... /* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6) /* 108 */ // ((input[0, bigint, false] + 4) + 5) /* 109 */ // (input[0, bigint, false] + 4) ... /* 126 */ } ``` **After** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 077 */ // initialize Range ... /* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 091 */ // CONSUME: WholeStageCodegen /* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3) ... /* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6) ... /* 118 */ } ``` ## How was this patch tested? Pass the Jenkins tests and see the result of the following command manually. ```scala scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen() ``` Author: Dongjoon Hyun <dongjoonapache.org> Author: Reynold Xin <rxindatabricks.com> Author: Dongjoon Hyun <[email protected]> Closes apache#13192 from dongjoon-hyun/SPARK-13135.
…code ## What changes were proposed in this pull request? This PR is an up-to-date and a little bit improved version of #11019 of rxin for - (1) preventing recursive printing of expressions in generated code. Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation. - (2) Improve multiline comment indentation. - (3) Reduce the number of empty lines (mainly consecutive empty lines). - (4) Remove all space characters on empty lines. **Example** ```scala spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6) ``` **Before** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ /* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 078 */ /* 079 */ // initialize Range ... /* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 093 */ /* 094 */ // CONSUME: WholeStageCodegen /* 095 */ /* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3) /* 097 */ // ((input[0, bigint, false] + 1) + 2) /* 098 */ // (input[0, bigint, false] + 1) ... /* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6) /* 108 */ // ((input[0, bigint, false] + 4) + 5) /* 109 */ // (input[0, bigint, false] + 4) ... /* 126 */ } ``` **After** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 077 */ // initialize Range ... /* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 091 */ // CONSUME: WholeStageCodegen /* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3) ... /* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6) ... /* 118 */ } ``` ## How was this patch tested? Pass the Jenkins tests and see the result of the following command manually. ```scala scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen() ``` Author: Dongjoon Hyun <dongjoonapache.org> Author: Reynold Xin <rxindatabricks.com> Author: Dongjoon Hyun <[email protected]> Closes #13192 from dongjoon-hyun/SPARK-13135.
Our code generation currently prints expressions recursively. For example, for expression "(1 + 1) + 1)", we would print the following:
"(1 + 1) + 1)"
"(1 + 1)"
"1"
"1"
This pull request changes codegen to print this only once.