Skip to content

Conversation

@amanomer
Copy link
Contributor

@amanomer amanomer commented Nov 29, 2019

What changes were proposed in this pull request?

This PR introduces a method expressionWithAlias in class FunctionRegistry which is used to register function's constructor. Currently, expressionWithAlias is used to register BoolAnd & BoolOr.

Why are the changes needed?

Error message is wrong when alias name is used for BoolAnd & BoolOr.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested manually.

For query,
select every('true');

Output before this PR,

Error in query: cannot resolve 'bool_and('true')' due to data type mismatch: Input to function 'bool_and' should have been boolean, but it's [string].; line 1 pos 7;

After this PR,

Error in query: cannot resolve 'every('true')' due to data type mismatch: Input to function 'every' should have been boolean, but it's [string].; line 1 pos 7;

@amanomer
Copy link
Contributor Author

@gatorsmile I have now handled for BoolAnd and BoolOr. Kindly review the changes.

@amanomer
Copy link
Contributor Author

cc @srowen @cloud-fan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do functions not already have names somewhere to use, that can already be set differently per alias? it looks like that's what nodeName is for, and it's already overridden in the aliases, so I'm missing why this is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do functions not already have names somewhere to use, that can already be set differently per alias?

I think, No? Currently, when we use alias of bool_and (i.e, every), it will be resolved as a constructor of BoolAnd using FunctionRegistry#expressions, which sets `bool_and' as a nodeName.

case class BoolAnd(arg: Expression) extends UnevaluableBooleanAggBase(arg) {
override def nodeName: String = "bool_and"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not make sense, I don't know this code well, but: is it not that "Exists" needs to customize its nodeName, for example? or does it never exist as such a node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every here does not have any node. It will be resolved as a BoolAnd.

expression[BoolAnd]("every"),
expression[BoolAnd]("bool_and"),
expression[BoolOr]("any"),
expression[BoolOr]("some"),
expression[BoolOr]("bool_or"),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't know enough to evaluate the effect of changing nodeName for all implementations, which seems like a broader change than required, but it does make some sense. Maybe @liancheng or @yhuai has an opinion.

@amanomer
Copy link
Contributor Author

amanomer commented Dec 2, 2019

@gatorsmile Kindly review this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bad practice to make the Expression mutable for trivial things like this. How about using RuntimeReplaceable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument's data types are matched in Analysis phase of planning and its optimizer's task to replace RuntimeReplaceable, correct me if I'm wrong.
So, optimization rules (ReplaceExpressions) will never be applied on queries like select every('true').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

case class BoolAnd(functionName: String, arg: Expression) ... with MultiNamesFunction {
  def nodeName = functionName
}

We can update FunctionRegistry.expression to detect MultiNamesFunction and pass name to the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR. cc @cloud-fan

@amanomer amanomer requested review from cloud-fan and srowen December 2, 2019 16:20
@cloud-fan
Copy link
Contributor

ok to test

val params = Seq.fill(expressions.size)(classOf[Expression])
val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse {
val f = constructors.find(e => e.getParameterTypes.toSeq == params
|| e.getParameterTypes.head == classOf[String]).getOrElse {
Copy link
Contributor

@cloud-fan cloud-fan Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's less hacky to create a new expressionWithAlias method, with only the necessary logic

def expressionWithAlias ... = {
  val constructors = tag.runtimeClass.getConstructors
    .filter(c => e.getParameterTypes.head == classOf[String])
  assert(constructors.length == 1)
  try {
    constructors.head.newInstance(name, expressions : _*).asInstanceOf[Expression]
  } ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we don't even need the MultiNamedExpression trait. We just need to register bool_and, bool_or with expressionWithAlias

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan updated as per your suggestions.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114761 has finished for PR 26712 at commit 3387eef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.filter(_.getParameterTypes.head == classOf[String])
assert(constructors.length == 1)
val builder = (expressions: Seq[Expression]) => {
Try(constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems better to use the normal try catch?

try {
  constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
} catch {
  // the original comment ...
  case e => throw new AnalysisException(e.getCause.getMessage)
}

We can update def expression as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest commit.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114764 has finished for PR 26712 at commit 4476e7e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class BoolAnd(funcName: String, arg: Expression) extends UnevaluableBooleanAggBase(arg)
  • case class BoolOr(funcName: String, arg: Expression) extends UnevaluableBooleanAggBase(arg)

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114786 has finished for PR 26712 at commit 77ad53f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114787 has finished for PR 26712 at commit 404d829.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114797 has finished for PR 26712 at commit 27f1a7f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too if tests pass

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114828 has finished for PR 26712 at commit 2952898.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114842 has finished for PR 26712 at commit 2952898.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114875 has finished for PR 26712 at commit bb665e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(constructors.length == 1)
val builder = (expressions: Seq[Expression]) => {
try {
constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we are not validating arguments, queries like
SELECT EVERY(true, false); will result true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can validate arguments with assert or as used in expression?
cc @cloud-fan @HyukjinKwon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it done in def expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val params = Seq.fill(expressions.size)(classOf[Expression])
val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse {
val validParametersCount = constructors
.filter(_.getParameterTypes.forall(_ == classOf[Expression]))
.map(_.getParameterCount).distinct.sorted
val invalidArgumentsMsg = if (validParametersCount.length == 0) {
s"Invalid arguments for function $name"
} else {
val expectedNumberOfParameters = if (validParametersCount.length == 1) {
validParametersCount.head.toString
} else {
validParametersCount.init.mkString("one of ", ", ", " and ") +
validParametersCount.last
}
s"Invalid number of arguments for function $name. " +
s"Expected: $expectedNumberOfParameters; Found: ${params.length}"
}
throw new AnalysisException(invalidArgumentsMsg)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, in expressionWithAlias we are always passing expressions.head to function's constructor. We can use assert statement

...
val builder = (expressions: Seq[Expression]) => {
      assert(expressions.size == 1,
        s"Invalid number of arguments for function $name. " +
          s"Expected: 1; Found: ${expressions.size}")
      assert(expressions.head == classOf[Expression],
        s"Invalid arguments for function $name")
      try {
        constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
      }
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is it possible to do newInstance(name.toString, expressions: _*)? Then it can work for other expressions that take more than 1 parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to do newInstance(name.toString, expressions: _*)?

No, compilation error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about newInstance((name.toString +: expressions): _*)?

Copy link
Contributor Author

@amanomer amanomer Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. Updated in latest commit. cc @cloud-fan

@amanomer
Copy link
Contributor Author

amanomer commented Dec 5, 2019

There are other function with alias name example VarianceSamp, StddevSamp. Should we change them in this PR?

@cloud-fan
Copy link
Contributor

Should we change them in this PR?

I'm fine either way.

@amanomer
Copy link
Contributor Author

amanomer commented Dec 5, 2019

Ok, I will handle them in different PR.

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114899 has finished for PR 26712 at commit 3c87222.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114903 has finished for PR 26712 at commit 17c91f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@amanomer
Copy link
Contributor Author

amanomer commented Dec 6, 2019

cc @cloud-fan

@amanomer
Copy link
Contributor Author

amanomer commented Dec 6, 2019

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114940 has finished for PR 26712 at commit d07d261.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@amanomer amanomer changed the title [SPARK-29883][SQL] Improve error messages when function name is an alias [SPARK-29883][SQL] Improve error messages when bool_and() and bool_or() is called using alias Dec 7, 2019
@srowen
Copy link
Member

srowen commented Dec 8, 2019

I'm OK with it if @cloud-fan is.

case Failure(e) =>
// the exception is an invocation exception. To get a meaningful message, we need the
// cause.
throw new AnalysisException(e.getCause.getMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @amanomer . I'm wondering if this change is required in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reformatting of try-catch block can be raised in different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this is fine and cleaner, so think it's OK to change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are similar try-catch block format on other files too, which can be reformatted like this.

case Failure(e) =>
// the exception is an invocation exception. To get a meaningful message, we need the
// cause.
throw new AnalysisException(e.getCause.getMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the title clearer like Implement a helper method for aliasing registered functions?

@amanomer amanomer changed the title [SPARK-29883][SQL] Improve error messages when bool_and() and bool_or() is called using alias [SPARK-29883][SQL] IImplement a helper method for aliasing bool_and() and bool_or() Dec 9, 2019
@amanomer amanomer changed the title [SPARK-29883][SQL] IImplement a helper method for aliasing bool_and() and bool_or() [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() Dec 9, 2019
@cloud-fan cloud-fan closed this in dcea7a4 Dec 9, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@amanomer
Copy link
Contributor Author

amanomer commented Dec 9, 2019

Thanks all for reviewing and merging

cloud-fan pushed a commit that referenced this pull request Dec 20, 2019
### What changes were proposed in this pull request?
This PR is to use `expressionWithAlias` for remaining functions for which alias name can be used. Remaining functions are:
`Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSamp`

PR #26712 introduced `expressionWithAlias`
### Why are the changes needed?
Error message is wrong when alias name is used for above mentioned functions.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually

Closes #26808 from amanomer/fncAlias.

Lead-authored-by: Aman Omer <[email protected]>
Co-authored-by: Aman Omer <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants