Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Mar 13, 2020

What changes were proposed in this pull request?

This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

  • Expression, UnaryMathExpression and BinaryMathExpression search the alias in the tags by default.
  • When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., CallMethodViaReflection, Remainder, CurrentTimestamp,
    FormatString and XPathDouble.

This PR fixes the aliases of the functions below:

class alias
Rand random
Ceil ceiling
Remainder mod
Pow pow
Signum sign
Chr char
Length char_length
Length character_length
FormatString printf
Substring substr
Upper ucase
XPathDouble xpath_number
DayOfMonth day
CurrentTimestamp now
Size cardinality
Sha1 sha
CallMethodViaReflection java_method

Note: EqualTo, = and == aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as ToDegrees, ToRadians, UnaryMinus and UnaryPositive that needs an explicit name overwritten to make the scope of this PR smaller.

Why are the changes needed?

To respect expression name.

Does this PR introduce any user-facing change?

Yes, it will change the output column name.

How was this patch tested?

Manually tested, and unittests were added.

@SparkQA

This comment has been minimized.

@HyukjinKwon HyukjinKwon changed the title [SPARK-31146][SQL] Leverage the helper method for aliasing in all SQL expressions [SPARK-31146][SQL] Leverage the helper method for aliasing in built-in SQL expressions Mar 13, 2020
@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member

Retest this please.

select ln(1.2345678e-28)
-- !query schema
struct<LOG(1.2345678E-28):double>
struct<log(1.2345678E-28):double>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have ln instead of log?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Mar 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually realised that there are some more instances such as ToDegrees, ToRadians, UnaryMinus and UnaryPositive.. let me exclude these to make the scope my PR addresses smaller

@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member

Oh, R failure is relevant one.

test_sparkSQL.R:1785: failure: column binary mathfunctions
collect(select(df, atan2(df$a, df$b)))[1, "ATAN2(a, b)"] not equal to atan2(1, 5).
target is NULL, current is numeric

@maropu
Copy link
Member

maropu commented Mar 14, 2020

Looks nice. For easy trackability in commit logs, could you add a list of the functions (that this PR adds alias flags for) in the PR description?

@HyukjinKwon
Copy link
Member Author

cc @cloud-fan too - I realised that you touched lots of codes related to this during commit history search.

@SparkQA

This comment has been minimized.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 15, 2020

Test build #119813 has finished for PR 27901 at commit b3f40a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users code refers the output column name, will such change break user code? Should we update migration guide too?

@HyukjinKwon
Copy link
Member Author

I don't think we have made guarantee on the output column name. Also, I would think this is as a rather bug fix. We have already made such changes a lot in the history.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @HyukjinKwon and all.
Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…n SQL expressions

### What changes were proposed in this pull request?

This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

- `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default.
- When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`,
`FormatString` and `XPathDouble`.

This PR fixes the aliases of the functions below:

| class                    | alias            |
|--------------------------|------------------|
|`Rand`                    |`random`          |
|`Ceil`                    |`ceiling`         |
|`Remainder`               |`mod`             |
|`Pow`                     |`pow`             |
|`Signum`                  |`sign`            |
|`Chr`                     |`char`            |
|`Length`                  |`char_length`     |
|`Length`                  |`character_length`|
|`FormatString`            |`printf`          |
|`Substring`               |`substr`          |
|`Upper`                   |`ucase`           |
|`XPathDouble`             |`xpath_number`    |
|`DayOfMonth`              |`day`             |
|`CurrentTimestamp`        |`now`             |
|`Size`                    |`cardinality`     |
|`Sha1`                    |`sha`             |
|`CallMethodViaReflection` |`java_method`     |

Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller.

### Why are the changes needed?

To respect expression name.

### Does this PR introduce any user-facing change?

Yes, it will change the output column name.

### How was this patch tested?

Manually tested, and unittests were added.

Closes #27901 from HyukjinKwon/31146.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6704103)
Signed-off-by: Dongjoon Hyun <[email protected]>
@HyukjinKwon
Copy link
Member Author

Thank you guys!

@gatorsmile
Copy link
Member

We need to update the migration guide. This impacts the output schema of SQL queries.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 1, 2020

Sure. I am sorry, I rushed to read. I think we haven't documented such output column names in the migration so far, e.g., #26808. Let me leave it out for now.

@HyukjinKwon
Copy link
Member Author

And I think we don't also guarantee on output columns names (#27901 (comment)).

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…n SQL expressions

### What changes were proposed in this pull request?

This PR is kind of a followup of apache#26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

- `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default.
- When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`,
`FormatString` and `XPathDouble`.

This PR fixes the aliases of the functions below:

| class                    | alias            |
|--------------------------|------------------|
|`Rand`                    |`random`          |
|`Ceil`                    |`ceiling`         |
|`Remainder`               |`mod`             |
|`Pow`                     |`pow`             |
|`Signum`                  |`sign`            |
|`Chr`                     |`char`            |
|`Length`                  |`char_length`     |
|`Length`                  |`character_length`|
|`FormatString`            |`printf`          |
|`Substring`               |`substr`          |
|`Upper`                   |`ucase`           |
|`XPathDouble`             |`xpath_number`    |
|`DayOfMonth`              |`day`             |
|`CurrentTimestamp`        |`now`             |
|`Size`                    |`cardinality`     |
|`Sha1`                    |`sha`             |
|`CallMethodViaReflection` |`java_method`     |

Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller.

### Why are the changes needed?

To respect expression name.

### Does this PR introduce any user-facing change?

Yes, it will change the output column name.

### How was this patch tested?

Manually tested, and unittests were added.

Closes apache#27901 from HyukjinKwon/31146.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@HyukjinKwon HyukjinKwon deleted the 31146 branch July 27, 2020 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants