Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Feb 2, 2016

Our code generation currently prints expressions recursively. For example, for expression "(1 + 1) + 1)", we would print the following:
"(1 + 1) + 1)"
"(1 + 1)"
"1"
"1"

This pull request changes codegen to print this only once.

@rxin
Copy link
Contributor Author

rxin commented Feb 2, 2016

query:

sqlContext.range(1, 1000).select('id + 1 + 1).show()

old generated code:

/* 068 */     while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */       long Range_value4 = Range_number2;
/* 070 */       Range_number2 += 1L;
/* 071 */       if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */         Range_overflow3 = true;
/* 073 */       }
/* 074 */       
/* 075 */       /* ((input[0, bigint] + 1) + 1) */
/* 076 */       /* (input[0, bigint] + 1) */
/* 077 */       /* input[0, bigint] */
/* 078 */       
/* 079 */       /* 1 */
/* 080 */       
/* 081 */       long Project_value8 = -1L;
/* 082 */       Project_value8 = Range_value4 + 1L;
/* 083 */       /* 1 */
/* 084 */       
/* 085 */       long Project_value6 = -1L;
/* 086 */       Project_value6 = Project_value8 + 1L;
/* 087 */       
/* 088 */       
/* 089 */       /* input[0, bigint] */
/* 090 */       
/* 091 */       Project_rowWriter19.write(0, Project_value6);
/* 092 */       currentRow = Project_result17;
/* 093 */       return;
/* 094 */       
/* 095 */       
/* 096 */     }

new generated code

/* 068 */     while (!Range_overflow3 && Range_number2 < Range_partitionEnd1) {
/* 069 */       long Range_value4 = Range_number2;
/* 070 */       Range_number2 += 1L;
/* 071 */       if (Range_number2 < Range_value4 ^ 1L < 0) {
/* 072 */         Range_overflow3 = true;
/* 073 */       }
/* 074 */       
/* 075 */       // project list: [((input[0, bigint] + 1) + 1)]
/* 076 */       
/* 077 */       
/* 078 */       
/* 079 */       
/* 080 */       long Project_value8 = -1L;
/* 081 */       Project_value8 = Range_value4 + 1L;
/* 082 */       
/* 083 */       
/* 084 */       long Project_value6 = -1L;
/* 085 */       Project_value6 = Project_value8 + 1L;
/* 086 */       
/* 087 */       
/* 088 */       
/* 089 */       // project list: [input[0, bigint]]
/* 090 */       
/* 091 */       
/* 092 */       Project_rowWriter19.write(0, Project_value6);
/* 093 */       currentRow = Project_result17;
/* 094 */       return;
/* 095 */       
/* 096 */       
/* 097 */     }

@rxin
Copy link
Contributor Author

rxin commented Feb 2, 2016

Let me know if you guys think this is more clear. Alternatively, we can also print the expression comment only for top level expressions.

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50551 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that value could be a constant too

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

LGTM, it will be good if we can also remove the continuous blank lines...

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #2489 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50554 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50570 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli
Copy link
Contributor

nongli commented Feb 2, 2016

I like this output much better.

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #2492 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use /* ...... */ in case the comments cross multiple lines

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #2494 has finished for PR 11019 at commit 2c2b007.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin closed this Feb 18, 2016
ghost pushed a commit to dbtsai/spark that referenced this pull request May 24, 2016
…code

## What changes were proposed in this pull request?

This PR is an up-to-date and a little bit improved version of apache#11019 of rxin for
- (1) preventing recursive printing of expressions in generated code.

Since the major function of this PR is indeed the above,  he should be credited for the work he did. In addition to apache#11019, this PR improves the followings in code generation.
- (2) Improve multiline comment indentation.
- (3) Reduce the number of empty lines (mainly consecutive empty lines).
- (4) Remove all space characters on empty lines.

**Example**
```scala
spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)
```

**Before**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */
/* 077 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 078 */
/* 079 */     // initialize Range
...
/* 092 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 093 */
/* 094 */       // CONSUME: WholeStageCodegen
/* 095 */
/* 096 */       // (((input[0, bigint, false] + 1) + 2) + 3)
/* 097 */       // ((input[0, bigint, false] + 1) + 2)
/* 098 */       // (input[0, bigint, false] + 1)
...
/* 107 */       // (((input[0, bigint, false] + 4) + 5) + 6)
/* 108 */       // ((input[0, bigint, false] + 4) + 5)
/* 109 */       // (input[0, bigint, false] + 4)
...
/* 126 */ }
```

**After**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */  * Codegend pipeline for
/* 007 */  * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */  * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */  */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 077 */     // initialize Range
...
/* 090 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 091 */       // CONSUME: WholeStageCodegen
/* 092 */       // (((input[0, bigint, false] + 1) + 2) + 3)
...
/* 101 */       // (((input[0, bigint, false] + 4) + 5) + 6)
...
/* 118 */ }
```

## How was this patch tested?

Pass the Jenkins tests and see the result of the following command manually.
```scala
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen()
```

Author: Dongjoon Hyun <dongjoonapache.org>
Author: Reynold Xin <rxindatabricks.com>

Author: Dongjoon Hyun <[email protected]>

Closes apache#13192 from dongjoon-hyun/SPARK-13135.
asfgit pushed a commit that referenced this pull request May 24, 2016
…code

## What changes were proposed in this pull request?

This PR is an up-to-date and a little bit improved version of #11019 of rxin for
- (1) preventing recursive printing of expressions in generated code.

Since the major function of this PR is indeed the above,  he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation.
- (2) Improve multiline comment indentation.
- (3) Reduce the number of empty lines (mainly consecutive empty lines).
- (4) Remove all space characters on empty lines.

**Example**
```scala
spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)
```

**Before**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */
/* 077 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 078 */
/* 079 */     // initialize Range
...
/* 092 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 093 */
/* 094 */       // CONSUME: WholeStageCodegen
/* 095 */
/* 096 */       // (((input[0, bigint, false] + 1) + 2) + 3)
/* 097 */       // ((input[0, bigint, false] + 1) + 2)
/* 098 */       // (input[0, bigint, false] + 1)
...
/* 107 */       // (((input[0, bigint, false] + 4) + 5) + 6)
/* 108 */       // ((input[0, bigint, false] + 4) + 5)
/* 109 */       // (input[0, bigint, false] + 4)
...
/* 126 */ }
```

**After**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */  * Codegend pipeline for
/* 007 */  * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */  * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */  */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 077 */     // initialize Range
...
/* 090 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 091 */       // CONSUME: WholeStageCodegen
/* 092 */       // (((input[0, bigint, false] + 1) + 2) + 3)
...
/* 101 */       // (((input[0, bigint, false] + 4) + 5) + 6)
...
/* 118 */ }
```

## How was this patch tested?

Pass the Jenkins tests and see the result of the following command manually.
```scala
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen()
```

Author: Dongjoon Hyun <dongjoonapache.org>
Author: Reynold Xin <rxindatabricks.com>

Author: Dongjoon Hyun <[email protected]>

Closes #13192 from dongjoon-hyun/SPARK-13135.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants