-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types #21361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| override def dataType: DataType = BooleanType | ||
|
|
||
| @transient private lazy val ordering: Ordering[Any] = | ||
| TypeUtils.getInterpretedOrdering(right.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then in checkInputDataTypes we should check if there is Ordering for right.dataType. Otherwise for example MapType will throw a match error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks
| if (v == null) { | ||
| hasNull = true | ||
| } else if (v == value) { | ||
| } else if (ordering.equiv(v, value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously does this work for Map? No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapType is not supported in comparison, even =
|
|
|
Great catch, thanks. |
|
yes @ueshin , I can fix |
|
@mgaido91 Thank you very much. Can I ask you to fix |
|
sure, thanks @kiszk. Sorry, I saw your comment only now, probably we were writing at the same time :) |
|
We should also fix |
|
yes @ueshin , great catch! I am fixing it too. |
|
Test build #90785 has finished for PR 21361 at commit
|
| "Arguments must be an array followed by a value of same type as the array members") | ||
| } else { | ||
| TypeCheckResult.TypeCheckSuccess | ||
| if (RowOrdering.isOrderable(right.dataType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can call TypeUtils.checkForOrderingExpr
|
good catch! this is a long-standing bug... |
|
Test build #90788 has finished for PR 21361 at commit
|
|
Test build #90790 has finished for PR 21361 at commit
|
|
Test build #90794 has finished for PR 21361 at commit
|
| test("SPARK-24313: support complex types as map keys") { | ||
| val mb0 = Literal.create( | ||
| Map(Array[Byte](1, 2) -> "1", Array[Byte](3, 4) -> null, Array[Byte](2, 1) -> "2"), | ||
| MapType(BinaryType, StringType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we test ArrayTypeto reflect the test name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the test name here and the comment for ElementAt because I found an issue in using arrays as keys for map. When I do that, I get:
Caused by: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
I will investigate this further, but I think this is a separate issue and should be addressed in another PR, what do you think?
| // test complex types as keys | ||
| val mb0 = Literal.create( | ||
| Map(Array[Byte](1, 2) -> "1", Array[Byte](3, 4) -> null, Array[Byte](2, 1) -> "2"), | ||
| MapType(BinaryType, StringType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| ArrayType(BinaryType)) | ||
| val b3 = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)), | ||
| ArrayType(BinaryType)) | ||
| val be = Literal.create(Array[Byte](1, 2), BinaryType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, binary type is not complex type
|
Test build #90857 has finished for PR 21361 at commit
|
|
retest this please |
|
Test build #90863 has finished for PR 21361 at commit
|
|
LGTM, can we add an end-to-end test case for |
|
@cloud-fan sorry but I am not sure I got it. May you please provide me some more details about the end-to-end test case for |
|
e.g. we should use |
|
thank you for your kind explanation @cloud-fan. I added the UT you suggested. Thanks. |
| assert(complexData.filter(complexData("m").getItem("1") === 1).count() == 1) | ||
| assert(complexData.filter(complexData("s").getField("key") === 1).count() == 1) | ||
|
|
||
| // SPARK-24313: access binary keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a new test case? I think a correctness bug deserves an individual end-to-end test case.
|
Test build #90900 has finished for PR 21361 at commit
|
|
Test build #90903 has finished for PR 21361 at commit
|
|
thanks, merging to master! @mgaido91 can you send a new PR for 2.3? thanks! |
|
sure @cloud-fan, thanks. I will create it in the next days. Thank you. |
…ed evaluation for complex types The interpreted evaluation of several collection operations works only for simple datatypes. For complex data types, for instance, `array_contains` it returns always `false`. The list of the affected functions is `array_contains`, `array_position`, `element_at` and `GetMapValue`. The PR fixes the behavior for all the datatypes. added UT Author: Marco Gaido <[email protected]> Closes apache#21361 from mgaido91/SPARK-24313.
| } else { | ||
| TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.") | ||
| } | ||
| TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a general suggestion. For these refactoring, we should do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll keep this in mind for the future, thanks.
|
Thanks for fixing this! |
What changes were proposed in this pull request?
The interpreted evaluation of several collection operations works only for simple datatypes. For complex data types, for instance,
array_containsit returns alwaysfalse. The list of the affected functions isarray_contains,array_position,element_atandGetMapValue.The PR fixes the behavior for all the datatypes.
How was this patch tested?
added UT