-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17989][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException #15532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this into "sql/core/src/test/resources/sql-tests/inputs/array.sql"
|
Test build #67129 has finished for PR 15532 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explicitly mention that the error is in context of sort_array() ? For SQL query which spans 100+ lines, its easy to get lost while tracking why it failed. Having better error messages would make that easier for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thank you both. Will adress the comments soon. |
5f73245 to
cbd9605
Compare
|
Test build #67167 has finished for PR 15532 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comment explaining what this is testing? (error reporting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the entirety of the error message, it looks like maybe we don't need to call out sort_array in the error message, since it is already here.
cc @tejasapatil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, It seems it does not print it (prettyName) in the message (from the expression) as default in ExpectsInputTypes/ImplicitCastInputTypes.
s"argument ${idx + 1} requires ${expected.simpleString} type, " +
s"however, '${child.sql}' is of ${child.dataType.simpleString} type."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Now that I see the final error message, its fine to omit that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming @tejasapatil
55c0804 to
eafe4d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you should just say "requires boolean type" and remove "as non-null".
I don't know what that was saying ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, say "requires a boolean literal"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe even ignore the actual value to keep it short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also technically your current english sentence is grammatically incorrect; you can't use comma to separate however here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I wrote this as I was worried of the case cast(NULL as boolean).
Sort order in second argument requires boolean type as non-null, however, it is 'CAST(NULL AS BOOLEAN)' as boolean type.
I will just make it short. Actually, that sentence was copied from the default one.
… ClassCastException
eafe4d6 to
6ff7c7d
Compare
| TypeCheckResult.TypeCheckSuccess | ||
| case _ => | ||
| TypeCheckResult.TypeCheckFailure( | ||
| "Sort order in second argument requires a boolean literal.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the error msg for ExpectsInputTypes and it is not consistent anymore. Anyway I think this is fine. I will go through this and fix them myself later.
|
LGTM pending Jenkins. |
|
Thank you @rxin and @tejasapatil |
|
Test build #67174 has finished for PR 15532 at commit
|
|
Test build #67176 has finished for PR 15532 at commit
|
|
Thanks - merging in master/branch-2.0. |
…ather than throwing ClassCastException
## What changes were proposed in this pull request?
This PR proposes to check the second argument, `ascendingOrder` rather than throwing `ClassCastException` exception message.
```sql
select sort_array(array('b', 'd'), '1');
```
**Before**
```
16/10/19 13:16:08 ERROR SparkSQLDriver: Failed in [select sort_array(array('b', 'd'), '1')]
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Boolean
at scala.runtime.BoxesRunTime.unboxToBoolean(BoxesRunTime.java:85)
at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:185)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:50)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:43)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:291)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:297)
```
**After**
```
Error in query: cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7;
```
## How was this patch tested?
Unit test in `DataFrameFunctionsSuite`.
Author: hyukjinkwon <[email protected]>
Closes #15532 from HyukjinKwon/SPARK-17989.
(cherry picked from commit 4b2011e)
Signed-off-by: Reynold Xin <[email protected]>
…ather than throwing ClassCastException
## What changes were proposed in this pull request?
This PR proposes to check the second argument, `ascendingOrder` rather than throwing `ClassCastException` exception message.
```sql
select sort_array(array('b', 'd'), '1');
```
**Before**
```
16/10/19 13:16:08 ERROR SparkSQLDriver: Failed in [select sort_array(array('b', 'd'), '1')]
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Boolean
at scala.runtime.BoxesRunTime.unboxToBoolean(BoxesRunTime.java:85)
at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:185)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:50)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:43)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:291)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:297)
```
**After**
```
Error in query: cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7;
```
## How was this patch tested?
Unit test in `DataFrameFunctionsSuite`.
Author: hyukjinkwon <[email protected]>
Closes apache#15532 from HyukjinKwon/SPARK-17989.
…ather than throwing ClassCastException
## What changes were proposed in this pull request?
This PR proposes to check the second argument, `ascendingOrder` rather than throwing `ClassCastException` exception message.
```sql
select sort_array(array('b', 'd'), '1');
```
**Before**
```
16/10/19 13:16:08 ERROR SparkSQLDriver: Failed in [select sort_array(array('b', 'd'), '1')]
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Boolean
at scala.runtime.BoxesRunTime.unboxToBoolean(BoxesRunTime.java:85)
at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:185)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:50)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:43)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:291)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:297)
```
**After**
```
Error in query: cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7;
```
## How was this patch tested?
Unit test in `DataFrameFunctionsSuite`.
Author: hyukjinkwon <[email protected]>
Closes apache#15532 from HyukjinKwon/SPARK-17989.
What changes were proposed in this pull request?
This PR proposes to check the second argument,
ascendingOrderrather than throwingClassCastExceptionexception message.Before
After
How was this patch tested?
Unit test in
DataFrameFunctionsSuite.