Skip to content

Conversation

@amanomer
Copy link
Contributor

@amanomer amanomer commented Dec 9, 2019

What changes were proposed in this pull request?

This PR is to use expressionWithAlias for remaining functions for which alias name can be used. Remaining functions are:
Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSamp

PR #26712 introduced expressionWithAlias

Why are the changes needed?

Error message is wrong when alias name is used for above mentioned functions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually

@amanomer
Copy link
Contributor Author

amanomer commented Dec 9, 2019

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115054 has finished for PR 26808 at commit 7d5f4be.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val result = AggregateExpression(
aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)),
aggregate.First("first",
evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit verbose to always set "first" for First, so how about defining auxiliary constructor?

  def this(child: Expression, ignoreNullsExpr: Expression) =
    this("first", child, Literal.create(false, BooleanType))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests pass, LGTM except for one minor comment

@maropu
Copy link
Member

maropu commented Dec 10, 2019

btw, why did you include fixes for Average and ApproximatePercentile in this pr?

*/
def first(e: Column, ignoreNulls: Boolean): Column = withAggregateFunction {
new First(e.expr, Literal(ignoreNulls))
new First("first", e.expr, Literal(ignoreNulls))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a constructor to provide a default name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@amanomer
Copy link
Contributor Author

btw, why did you include fixes for Average and ApproximatePercentile in this pr?

Average and ApproximatePercentile can be called with alias name as well.

expression[ApproximatePercentile]("percentile_approx"),
expression[ApproximatePercentile]("approx_percentile"),

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115081 has finished for PR 26808 at commit 759262d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115082 has finished for PR 26808 at commit 5ec101f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Select the result of the first aggregate in the last aggregate.
val result = AggregateExpression(
aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)),
new aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so adding more constructors doesn't help us to reduce diff, how about

case class First(funcName: String, child: Expression, ignoreNullsExpr: Expression) {
  def this(funcName: String, child: Expression)
}
object First {
  def apply(child: Expression, ignoreNullsExpr: Expression) ...
  def apply(child: Expression) ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115106 has finished for PR 26808 at commit f210bb9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115109 has finished for PR 26808 at commit cc234b9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// 1. Maven can't get correct resource directory when resources in other jars.
// 2. We test subclasses in the hive-thriftserver module.
val sparkHome = {
sys.props.put("spark.test.home", "/home/root1/spark")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opps...I'll revert this

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one comment.

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115142 has finished for PR 26808 at commit 92e381b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115141 has finished for PR 26808 at commit 7617ec3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val evaluator = DeclarativeAggregateEvaluator(Last(input, Literal(false)), Seq(input))
val evaluatorIgnoreNulls = DeclarativeAggregateEvaluator(Last(input, Literal(true)), Seq(input))
val evaluatorIgnoreNulls = DeclarativeAggregateEvaluator(
Last(input, Literal(true)), Seq(input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary change

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115524 has finished for PR 26808 at commit 43ce929.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115526 has finished for PR 26808 at commit e25271e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115528 has finished for PR 26808 at commit 5d057c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/** See usage above. */
private def expression[T <: Expression](name: String)
private def expression[T <: Expression](name: String, isAliasName: Boolean = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: setAlias: Boolean = false

throw new UnsupportedOperationException(s"$nodeName does not implement simpleStringWithNodeId")
}

val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define it in object FunctionRegistry. This is kind of a map key, and we don't need to create a different instance for each expression.

expression[HyperLogLogPlusPlus]("approx_count_distinct"),
expression[Average]("avg"),
expressionWithAlias[Average]("avg"),
expressionWithAlias[Average]("mean"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't need expressionWithAlias at all, just call expression[Average]("mean", setAlias = true).

And expression[Average]("avg") can remain unchanged, as avg is already the default name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't need to reorder them now.

}

override def prettyName: String = "percentile_approx"
override def nodeName: String = getTagValue(FUNC_ALIAS).getOrElse("percentile_approx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's override prettyName as before.

override lazy val evaluateExpression: AttributeReference = first

override def toString: String = s"first($child)${if (ignoreNulls) " ignore nulls"}"
override def nodeName: String = getTagValue(FUNC_ALIAS).getOrElse("first")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, override prettyName seems better.

@cloud-fan
Copy link
Contributor

in general looks good, thanks for updating!

@amanomer
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 19, 2019

Test build #115559 has finished for PR 26808 at commit 700a84d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@amanomer
Copy link
Contributor Author

amanomer commented Dec 19, 2019

Tests have passed. cc @cloud-fan @maropu

@cloud-fan cloud-fan closed this in 726f6d3 Dec 20, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…n SQL expressions

### What changes were proposed in this pull request?

This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

- `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default.
- When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`,
`FormatString` and `XPathDouble`.

This PR fixes the aliases of the functions below:

| class                    | alias            |
|--------------------------|------------------|
|`Rand`                    |`random`          |
|`Ceil`                    |`ceiling`         |
|`Remainder`               |`mod`             |
|`Pow`                     |`pow`             |
|`Signum`                  |`sign`            |
|`Chr`                     |`char`            |
|`Length`                  |`char_length`     |
|`Length`                  |`character_length`|
|`FormatString`            |`printf`          |
|`Substring`               |`substr`          |
|`Upper`                   |`ucase`           |
|`XPathDouble`             |`xpath_number`    |
|`DayOfMonth`              |`day`             |
|`CurrentTimestamp`        |`now`             |
|`Size`                    |`cardinality`     |
|`Sha1`                    |`sha`             |
|`CallMethodViaReflection` |`java_method`     |

Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller.

### Why are the changes needed?

To respect expression name.

### Does this PR introduce any user-facing change?

Yes, it will change the output column name.

### How was this patch tested?

Manually tested, and unittests were added.

Closes #27901 from HyukjinKwon/31146.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…n SQL expressions

### What changes were proposed in this pull request?

This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

- `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default.
- When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`,
`FormatString` and `XPathDouble`.

This PR fixes the aliases of the functions below:

| class                    | alias            |
|--------------------------|------------------|
|`Rand`                    |`random`          |
|`Ceil`                    |`ceiling`         |
|`Remainder`               |`mod`             |
|`Pow`                     |`pow`             |
|`Signum`                  |`sign`            |
|`Chr`                     |`char`            |
|`Length`                  |`char_length`     |
|`Length`                  |`character_length`|
|`FormatString`            |`printf`          |
|`Substring`               |`substr`          |
|`Upper`                   |`ucase`           |
|`XPathDouble`             |`xpath_number`    |
|`DayOfMonth`              |`day`             |
|`CurrentTimestamp`        |`now`             |
|`Size`                    |`cardinality`     |
|`Sha1`                    |`sha`             |
|`CallMethodViaReflection` |`java_method`     |

Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller.

### Why are the changes needed?

To respect expression name.

### Does this PR introduce any user-facing change?

Yes, it will change the output column name.

### How was this patch tested?

Manually tested, and unittests were added.

Closes #27901 from HyukjinKwon/31146.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6704103)
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…n SQL expressions

### What changes were proposed in this pull request?

This PR is kind of a followup of apache#26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable.

- `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default.
- When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`,
`FormatString` and `XPathDouble`.

This PR fixes the aliases of the functions below:

| class                    | alias            |
|--------------------------|------------------|
|`Rand`                    |`random`          |
|`Ceil`                    |`ceiling`         |
|`Remainder`               |`mod`             |
|`Pow`                     |`pow`             |
|`Signum`                  |`sign`            |
|`Chr`                     |`char`            |
|`Length`                  |`char_length`     |
|`Length`                  |`character_length`|
|`FormatString`            |`printf`          |
|`Substring`               |`substr`          |
|`Upper`                   |`ucase`           |
|`XPathDouble`             |`xpath_number`    |
|`DayOfMonth`              |`day`             |
|`CurrentTimestamp`        |`now`             |
|`Size`                    |`cardinality`     |
|`Sha1`                    |`sha`             |
|`CallMethodViaReflection` |`java_method`     |

Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser.

Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller.

### Why are the changes needed?

To respect expression name.

### Does this PR introduce any user-facing change?

Yes, it will change the output column name.

### How was this patch tested?

Manually tested, and unittests were added.

Closes apache#27901 from HyukjinKwon/31146.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants