Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented May 19, 2016

What changes were proposed in this pull request?

This PR is an up-to-date and a little bit improved version of #11019 of @rxin for

  • (1) preventing recursive printing of expressions in generated code.

Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation.

  • (2) Improve multiline comment indentation.
  • (3) Reduce the number of empty lines (mainly consecutive empty lines).
  • (4) Remove all space characters on empty lines.

Example

spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)

Before

Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */     
/* 077 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 078 */     
/* 079 */     // initialize Range
...
/* 092 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 093 */       
/* 094 */       // CONSUME: WholeStageCodegen
/* 095 */       
/* 096 */       // (((input[0, bigint, false] + 1) + 2) + 3)
/* 097 */       // ((input[0, bigint, false] + 1) + 2)
/* 098 */       // (input[0, bigint, false] + 1)
...
/* 107 */       // (((input[0, bigint, false] + 4) + 5) + 6)
/* 108 */       // ((input[0, bigint, false] + 4) + 5)
/* 109 */       // (input[0, bigint, false] + 4)
...
/* 126 */ }

After

Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */  * Codegend pipeline for
/* 007 */  * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */  * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */  */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 077 */     // initialize Range
...
/* 090 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 091 */       // CONSUME: WholeStageCodegen
/* 092 */       // (((input[0, bigint, false] + 1) + 2) + 3)
...
/* 101 */       // (((input[0, bigint, false] + 4) + 5) + 6)
...
/* 118 */ }

How was this patch tested?

Pass the Jenkins tests and see the result of the following command manually.

scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen()

Author: Dongjoon Hyun [email protected]
Author: Reynold Xin [email protected]

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
This is the first attempt according to your request.
I removed some obsolete code in #11019 in order to pass the tests.
Please let me know if there is something I missed mistakenly.

cc @cloud-fan @nongli

@rxin
Copy link
Contributor

rxin commented May 19, 2016

cc @sameeragarwal / @davies

@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58855 has finished for PR 13192 at commit ea7de3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove the empty lines here, or LINENO of the compiled code will be different than the formatted code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @davies .
Oh, I thought CodeFormatter.format is called before Janino and Guava loading cache, too.
I'll make that consistent in this afternoon. If then, it'll be okay.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented May 20, 2016

As @rxin told, what was really needed is removing overlapping comments.
So, I rethink about that and revert the change on Expression.gen which removes the code field.
code fields have their own meaning and are still valuable.
Instead, I can achieve that goal simply by adding CodeFormatter.stripOverlappingComments.
Also, I updated the description of this PR, too.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58960 has finished for PR 13192 at commit 5ffb249.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58971 has finished for PR 13192 at commit 5ffb249.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The PySpark failure is fixed as a HOTFIX.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #59005 has finished for PR 13192 at commit 2018d9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @davies .
It's ready for review, again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #12979 is merged, this may not work now.

@davies
Copy link
Contributor

davies commented May 20, 2016

@dongjoon-hyun Maybe we could have a method Expression.genCodeWithComment() that is used by generated projections and operators, Expression.genCode() called by other Expressions will not have comment in it. This requires change more places, not sure it's a good idea or not.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented May 20, 2016

Ya. There were huge changes. I've saw the PR before, but I didn't consider that in this PR.
My bad. Let me think how to solve the original goal with new master branch.
Thank you, @davies .

@dongjoon-hyun
Copy link
Member Author

Hi, @davies and @rxin .
I updated the code and description again according to the current master.

@SparkQA
Copy link

SparkQA commented May 21, 2016

Test build #59039 has finished for PR 13192 at commit 8257e78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @davies and @rxin .
Could you review this PR again when you have some time?

val line = l.trim()
val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") &&
line.startsWith("/*") && line.endsWith("*/") &&
map(lastLine).substring(3).contains(map(line).substring(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you check that this actually work? I think we have placeholders here so will not find any duplicated comments to skip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it should work, I missed the map. Will it have performance issue?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun May 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay for the performance.

  • This function is used for at every CodeAndComment creation once.
  • It scans codeAndComment.body once.
  • Map lookup occurs on each line at most twice. Also, it does not cost much in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the skip condition is checking only consecutive comments lines.
If there is something to do more, please let me know, @davies .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@davies
Copy link
Contributor

davies commented May 24, 2016

LGTM,
Merging this into master and 2.0, thanks!

asfgit pushed a commit that referenced this pull request May 24, 2016
…code

## What changes were proposed in this pull request?

This PR is an up-to-date and a little bit improved version of #11019 of rxin for
- (1) preventing recursive printing of expressions in generated code.

Since the major function of this PR is indeed the above,  he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation.
- (2) Improve multiline comment indentation.
- (3) Reduce the number of empty lines (mainly consecutive empty lines).
- (4) Remove all space characters on empty lines.

**Example**
```scala
spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)
```

**Before**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */
/* 077 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 078 */
/* 079 */     // initialize Range
...
/* 092 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 093 */
/* 094 */       // CONSUME: WholeStageCodegen
/* 095 */
/* 096 */       // (((input[0, bigint, false] + 1) + 2) + 3)
/* 097 */       // ((input[0, bigint, false] + 1) + 2)
/* 098 */       // (input[0, bigint, false] + 1)
...
/* 107 */       // (((input[0, bigint, false] + 4) + 5) + 6)
/* 108 */       // ((input[0, bigint, false] + 4) + 5)
/* 109 */       // (input[0, bigint, false] + 4)
...
/* 126 */ }
```

**After**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */  * Codegend pipeline for
/* 007 */  * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */  * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */  */
...
/* 075 */     // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */     // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 077 */     // initialize Range
...
/* 090 */       // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 091 */       // CONSUME: WholeStageCodegen
/* 092 */       // (((input[0, bigint, false] + 1) + 2) + 3)
...
/* 101 */       // (((input[0, bigint, false] + 4) + 5) + 6)
...
/* 118 */ }
```

## How was this patch tested?

Pass the Jenkins tests and see the result of the following command manually.
```scala
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen()
```

Author: Dongjoon Hyun <dongjoonapache.org>
Author: Reynold Xin <rxindatabricks.com>

Author: Dongjoon Hyun <[email protected]>

Closes #13192 from dongjoon-hyun/SPARK-13135.
@asfgit asfgit closed this in f8763b8 May 24, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you, @davies !

var lastLine: String = "dummy"
codeAndComment.body.split('\n').foreach { l =>
val line = l.trim()
val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we assuming the comment holder will always take an entire line?

@dongjoon-hyun dongjoon-hyun deleted the SPARK-13135 branch July 20, 2016 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants