Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Dec 20, 2016

What changes were proposed in this pull request?

If I use the function regexp_extract, and then in my regex string, use \, i.e. escape character, this fails codegen, because the \ character is not properly escaped when codegen'd.

Example stack trace:

/* 059 */     private int maxSteps = 2;
/* 060 */     private int numRows = 0;
/* 061 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType)
/* 062 */     .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType);
/* 063 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType);
/* 064 */     private Object emptyVBase;

...

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence
	at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918)
	at org.codehaus.janino.Scanner.produce(Scanner.java:604)
	at org.codehaus.janino.Parser.peekRead(Parser.java:3239)
	at org.codehaus.janino.Parser.parseArguments(Parser.java:3055)
	at org.codehaus.janino.Parser.parseSelector(Parser.java:2914)
	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617)
	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573)
	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552)

In the codegend expression, the literal should use \\ instead of \

A similar problem was solved here: #15156.

How was this patch tested?

Regression test in DataFrameAggregationSuite

@brkyvz
Copy link
Contributor Author

brkyvz commented Dec 20, 2016

cc @JoshRosen

val generatedSchema: String =
s"new org.apache.spark.sql.types.StructType()" +
(groupingKeySchema ++ bufferSchema).map { key =>
val keyName = ctx.addReferenceObj("keyName", key.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the unnamed / anonymous version of addReferenceObj here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Thanks for the suggestion!

val generatedKeySchema: String =
s"new org.apache.spark.sql.types.StructType()" +
groupingKeySchema.map { key =>
val keyName = ctx.addReferenceObj("keyName", key.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here; I'd use the unnamed overload.

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70432 has finished for PR 16361 at commit b858204.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70433 has finished for PR 16361 at commit c2de5ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 23, 2016

it seems to that the grouping key alias is only used for execution(logical Aggregate node doesn't need grouping expression to be named), can we just alias them with k1,k2, ... to avoid this problem? i.e. in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L221

@JoshRosen
Copy link
Contributor

@cloud-fan, while I agree that the field names aren't really used here, I don't think patterns.scala is the right place to fix this because it seems a little dodgy to be picking a safe name way over there to fix an unsafe interpolation over here. I'd be supportive of a change to set dummy names here in the *HashMapGenerator classes or to make a builder for StructTypes with anonymous / safe field names, but that's a larger change that I think should be done separately (it could also be accompanied by a change to remove the need to pass in full StructTypes when we only need the field count and types).

Given this, I'm going to retest and merge this as-is.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 3, 2017

Test build #70838 has finished for PR 16361 at commit c2de5ee.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

val generatedKeySchema: String =
s"new org.apache.spark.sql.types.StructType()" +
groupingKeySchema.map { key =>
val keyName = ctx.addReferenceObj(key.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you now need to use addReferenceMinorObj after a conflicting change was merged. We'll need to use the current version of the patch when backporting into branch-2.1, though.

@JoshRosen
Copy link
Contributor

LGTM pending Jenkins. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71088 has finished for PR 16361 at commit df3a852.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in faabe69 Jan 9, 2017
@JoshRosen
Copy link
Contributor

Merged to master.

asfgit pushed a commit that referenced this pull request Jan 9, 2017
… for aggregations

## What changes were proposed in this pull request?

Backport for #16361 to 2.1 branch.

## How was this patch tested?

Unit tests

Author: Burak Yavuz <[email protected]>

Closes #16518 from brkyvz/reg-break-2.1.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…gations

## What changes were proposed in this pull request?

If I use the function regexp_extract, and then in my regex string, use `\`, i.e. escape character, this fails codegen, because the `\` character is not properly escaped when codegen'd.

Example stack trace:
```
/* 059 */     private int maxSteps = 2;
/* 060 */     private int numRows = 0;
/* 061 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType)
/* 062 */     .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType);
/* 063 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType);
/* 064 */     private Object emptyVBase;

...

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence
	at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918)
	at org.codehaus.janino.Scanner.produce(Scanner.java:604)
	at org.codehaus.janino.Parser.peekRead(Parser.java:3239)
	at org.codehaus.janino.Parser.parseArguments(Parser.java:3055)
	at org.codehaus.janino.Parser.parseSelector(Parser.java:2914)
	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617)
	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573)
	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552)
```

In the codegend expression, the literal should use `\\` instead of `\`

A similar problem was solved here: apache#15156.

## How was this patch tested?

Regression test in `DataFrameAggregationSuite`

Author: Burak Yavuz <[email protected]>

Closes apache#16361 from brkyvz/reg-break.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…gations

## What changes were proposed in this pull request?

If I use the function regexp_extract, and then in my regex string, use `\`, i.e. escape character, this fails codegen, because the `\` character is not properly escaped when codegen'd.

Example stack trace:
```
/* 059 */     private int maxSteps = 2;
/* 060 */     private int numRows = 0;
/* 061 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType)
/* 062 */     .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType);
/* 063 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType);
/* 064 */     private Object emptyVBase;

...

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence
	at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918)
	at org.codehaus.janino.Scanner.produce(Scanner.java:604)
	at org.codehaus.janino.Parser.peekRead(Parser.java:3239)
	at org.codehaus.janino.Parser.parseArguments(Parser.java:3055)
	at org.codehaus.janino.Parser.parseSelector(Parser.java:2914)
	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617)
	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573)
	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552)
```

In the codegend expression, the literal should use `\\` instead of `\`

A similar problem was solved here: apache#15156.

## How was this patch tested?

Regression test in `DataFrameAggregationSuite`

Author: Burak Yavuz <[email protected]>

Closes apache#16361 from brkyvz/reg-break.
ghost pushed a commit to dbtsai/spark that referenced this pull request Oct 14, 2017
…ors.

## What changes were proposed in this pull request?

When fixing schema field names using escape characters with `addReferenceMinorObj()` at [SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (apache#16361), double-quotes around the names were remained and the names become something like `"((java.lang.String) references[1])"`.

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[1])", org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[2])", org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

We should remove the double-quotes to refer the values in `references` properly:

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[2]), org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#19491 from ueshin/issues/SPARK-22273.
asfgit pushed a commit that referenced this pull request Oct 14, 2017
…ors.

## What changes were proposed in this pull request?

When fixing schema field names using escape characters with `addReferenceMinorObj()` at [SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (#16361), double-quotes around the names were remained and the names become something like `"((java.lang.String) references[1])"`.

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[1])", org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[2])", org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

We should remove the double-quotes to refer the values in `references` properly:

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[2]), org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes #19491 from ueshin/issues/SPARK-22273.

(cherry picked from commit e0503a7)
Signed-off-by: gatorsmile <[email protected]>
asfgit pushed a commit that referenced this pull request Oct 14, 2017
…ors.

## What changes were proposed in this pull request?

When fixing schema field names using escape characters with `addReferenceMinorObj()` at [SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (#16361), double-quotes around the names were remained and the names become something like `"((java.lang.String) references[1])"`.

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[1])", org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[2])", org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

We should remove the double-quotes to refer the values in `references` properly:

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[2]), org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes #19491 from ueshin/issues/SPARK-22273.

(cherry picked from commit e0503a7)
Signed-off-by: gatorsmile <[email protected]>
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…ors.

## What changes were proposed in this pull request?

When fixing schema field names using escape characters with `addReferenceMinorObj()` at [SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (apache#16361), double-quotes around the names were remained and the names become something like `"((java.lang.String) references[1])"`.

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[1])", org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[2])", org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

We should remove the double-quotes to refer the values in `references` properly:

```java
/* 055 */     private int maxSteps = 2;
/* 056 */     private int numRows = 0;
/* 057 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[2]), org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */     private Object emptyVBase;
```

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#19491 from ueshin/issues/SPARK-22273.

(cherry picked from commit e0503a7)
Signed-off-by: gatorsmile <[email protected]>
@brkyvz brkyvz deleted the reg-break branch February 3, 2019 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants