[SPARK-13135][SQL] Don't print expressions recursively in generated code #11019

rxin · 2016-02-02T07:09:49Z

Our code generation currently prints expressions recursively. For example, for expression "(1 + 1) + 1)", we would print the following:
"(1 + 1) + 1)"
"(1 + 1)"
"1"
"1"

This pull request changes codegen to print this only once.

rxin · 2016-02-02T07:13:10Z

query:

sqlContext.range(1, 1000).select('id + 1 + 1).show()

old generated code:

/* 068 */     while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */       long Range_value4 = Range_number2;
/* 070 */       Range_number2 += 1L;
/* 071 */       if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */         Range_overflow3 = true;
/* 073 */       }
/* 074 */       
/* 075 */       /* ((input[0, bigint] + 1) + 1) */
/* 076 */       /* (input[0, bigint] + 1) */
/* 077 */       /* input[0, bigint] */
/* 078 */       
/* 079 */       /* 1 */
/* 080 */       
/* 081 */       long Project_value8 = -1L;
/* 082 */       Project_value8 = Range_value4 + 1L;
/* 083 */       /* 1 */
/* 084 */       
/* 085 */       long Project_value6 = -1L;
/* 086 */       Project_value6 = Project_value8 + 1L;
/* 087 */       
/* 088 */       
/* 089 */       /* input[0, bigint] */
/* 090 */       
/* 091 */       Project_rowWriter19.write(0, Project_value6);
/* 092 */       currentRow = Project_result17;
/* 093 */       return;
/* 094 */       
/* 095 */       
/* 096 */     }

new generated code

/* 068 */     while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */       long Range_value4 = Range_number2;
/* 070 */       Range_number2 += 1L;
/* 071 */       if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */         Range_overflow3 = true;
/* 073 */       }
/* 074 */       
/* 075 */       // project list: [((input[0, bigint] + 1) + 1)]
/* 076 */       
/* 077 */       
/* 078 */       
/* 079 */       
/* 080 */       long Project_value8 = -1L;
/* 081 */       Project_value8 = Range_value4 + 1L;
/* 082 */       
/* 083 */       
/* 084 */       long Project_value6 = -1L;
/* 085 */       Project_value6 = Project_value8 + 1L;
/* 086 */       
/* 087 */       
/* 088 */       
/* 089 */       // project list: [input[0, bigint]]
/* 090 */       
/* 091 */       
/* 092 */       Project_rowWriter19.write(0, Project_value6);
/* 093 */       currentRow = Project_result17;
/* 094 */       return;
/* 095 */       
/* 096 */       
/* 097 */     }

rxin · 2016-02-02T07:14:32Z

Let me know if you guys think this is more clear. Alternatively, we can also print the expression comment only for top level expressions.

SparkQA · 2016-02-02T07:40:55Z

Test build #50551 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-02-02T07:44:25Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

note that value could be a constant too

cloud-fan · 2016-02-02T07:46:07Z

retest this please

cloud-fan · 2016-02-02T07:46:21Z

LGTM, it will be good if we can also remove the continuous blank lines...

SparkQA · 2016-02-02T09:03:00Z

Test build #2489 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-02T09:07:39Z

Test build #50554 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-02-02T17:42:15Z

retest this please

SparkQA · 2016-02-02T19:13:36Z

Test build #50570 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

nongli · 2016-02-02T21:27:03Z

I like this output much better.

SparkQA · 2016-02-02T21:57:55Z

Test build #2492 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-02-02T22:31:21Z

...main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

we should use /* ...... */ in case the comments cross multiple lines

SparkQA · 2016-02-02T23:59:26Z

Test build #2494 has finished for PR 11019 at commit 2c2b007.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…code ## What changes were proposed in this pull request? This PR is an up-to-date and a little bit improved version of apache#11019 of rxin for - (1) preventing recursive printing of expressions in generated code. Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to apache#11019, this PR improves the followings in code generation. - (2) Improve multiline comment indentation. - (3) Reduce the number of empty lines (mainly consecutive empty lines). - (4) Remove all space characters on empty lines. **Example** ```scala spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6) ``` **Before** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ /* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 078 */ /* 079 */ // initialize Range ... /* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 093 */ /* 094 */ // CONSUME: WholeStageCodegen /* 095 */ /* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3) /* 097 */ // ((input[0, bigint, false] + 1) + 2) /* 098 */ // (input[0, bigint, false] + 1) ... /* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6) /* 108 */ // ((input[0, bigint, false] + 4) + 5) /* 109 */ // (input[0, bigint, false] + 4) ... /* 126 */ } ``` **After** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 077 */ // initialize Range ... /* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 091 */ // CONSUME: WholeStageCodegen /* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3) ... /* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6) ... /* 118 */ } ``` ## How was this patch tested? Pass the Jenkins tests and see the result of the following command manually. ```scala scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen() ``` Author: Dongjoon Hyun <dongjoonapache.org> Author: Reynold Xin <rxindatabricks.com> Author: Dongjoon Hyun <[email protected]> Closes apache#13192 from dongjoon-hyun/SPARK-13135.

…code ## What changes were proposed in this pull request? This PR is an up-to-date and a little bit improved version of #11019 of rxin for - (1) preventing recursive printing of expressions in generated code. Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation. - (2) Improve multiline comment indentation. - (3) Reduce the number of empty lines (mainly consecutive empty lines). - (4) Remove all space characters on empty lines. **Example** ```scala spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6) ``` **Before** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ /* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 078 */ /* 079 */ // initialize Range ... /* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 093 */ /* 094 */ // CONSUME: WholeStageCodegen /* 095 */ /* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3) /* 097 */ // ((input[0, bigint, false] + 1) + 2) /* 098 */ // (input[0, bigint, false] + 1) ... /* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6) /* 108 */ // ((input[0, bigint, false] + 4) + 5) /* 109 */ // (input[0, bigint, false] + 4) ... /* 126 */ } ``` **After** ``` Generated code: /* 001 */ public Object generate(Object[] references) { ... /* 005 */ /** /* 006 */ * Codegend pipeline for /* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 008 */ * +- Range 1, 1, 8, 999, [id#0L] /* 009 */ */ ... /* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L] /* 077 */ // initialize Range ... /* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L] /* 091 */ // CONSUME: WholeStageCodegen /* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3) ... /* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6) ... /* 118 */ } ``` ## How was this patch tested? Pass the Jenkins tests and see the result of the following command manually. ```scala scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen() ``` Author: Dongjoon Hyun <dongjoonapache.org> Author: Reynold Xin <rxindatabricks.com> Author: Dongjoon Hyun <[email protected]> Closes #13192 from dongjoon-hyun/SPARK-13135.

[SPARK-13135][SQL] Don't print expressions recursively in generated code

2c2b007

cloud-fan reviewed Feb 2, 2016
View reviewed changes

rxin closed this Feb 18, 2016

dongjoon-hyun mentioned this pull request May 19, 2016

[SPARK-13135][SQL] Don't print expressions recursively in generated code #13192

Closed

[SPARK-13135][SQL] Don't print expressions recursively in generated code #11019

[SPARK-13135][SQL] Don't print expressions recursively in generated code #11019

Uh oh!

Conversation

rxin commented Feb 2, 2016

Uh oh!

rxin commented Feb 2, 2016

Uh oh!

rxin commented Feb 2, 2016

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

cloud-fan Feb 2, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 2, 2016

Uh oh!

cloud-fan commented Feb 2, 2016

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

cloud-fan commented Feb 2, 2016

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

nongli commented Feb 2, 2016

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

cloud-fan Feb 2, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants