-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23915][SQL] Add array_except function #21103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #89524 has finished for PR 21103 at commit
|
|
Test build #89540 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #89561 has finished for PR 21103 at commit
|
|
Test build #89587 has finished for PR 21103 at commit
|
|
cc @ueshin |
|
Test build #89589 has finished for PR 21103 at commit
|
|
Test build #89605 has finished for PR 21103 at commit
|
|
Test build #89629 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #89651 has finished for PR 21103 at commit
|
|
Test build #92322 has finished for PR 21103 at commit
|
|
Test build #92997 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #93001 has finished for PR 21103 at commit
|
|
Test build #93270 has finished for PR 21103 at commit
|
|
Test build #93750 has finished for PR 21103 at commit
|
|
retest this please |
| | { | ||
| | $javaTypeName $value = $array2.$getter; | ||
| | $hsJavaTypeName $hsValue = $genHsValue; | ||
| | $hs.add$postFix($hsValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wrap them inside {...}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. Maybe a better way is
def withArray2NullCheck(body: String) = if (right.dataType.asInstanceOf[ArrayType].containsNull) {
s"""
|if ($array2.isNullAt($i)) {
| $notFoundNullElement = false;
|} else {
| $body
|}""".stripMargin
} else {
body
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part will be part of else clause of if statement depends on $array2NullCheck (i.e. if (right.dataType.asInstanceOf[ArrayType].containsNull) is true).
| } | ||
|
|
||
| nullSafeCodeGen(ctx, ev, (array1, array2) => { | ||
| if (openHashElementType != "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a better way to organize it
if (elementTypeSupportEquals) {
...
nullSafeCodeGen(...)
} else {
nullSafeCodeGen(...)
}
| |$hs = new $openHashSet$postFix($classTag); | ||
| |$notFoundNullElement = true; | ||
| |int $pos = 0; | ||
| |for (int $i = 0; $i < $array2.numElements(); $i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add array2 to the hash set again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add some elements in array1to the hash set at L4192.
Since we want to create a hashset that have elements only of array2, we reallocate a hashset at L4198 and add elements of array2 again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why we do the first iteration? only calculating the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only calculating array size for allocating a UnsafeArrayData or GenericArrayData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this worth? I feel using an ArrayBuffer like the interpreted path is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use 'ArrayBuffer' involves boxing. Let me consider 'ArrayBuffer.ofInt' or others with special null handling.
|
Test build #93759 has finished for PR 21103 at commit
|
| val (postFix, openHashElementType, hsJavaTypeName, genHsValue, | ||
| getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) = | ||
| elementType match { | ||
| case BooleanType | ByteType | ShortType | IntegerType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we can have special handling for boolean type(at most 2 outputs), but I think it's very rare to see boolean type in array_except. Let's leave it unless users require it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that it is very rare. Let us remove the special handling code for BooleanType.
|
Test build #93822 has finished for PR 21103 at commit
|
|
Test build #93819 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #93825 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #93830 has finished for PR 21103 at commit
|
|
Test build #93831 has finished for PR 21103 at commit
|
|
Test build #93836 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #93851 has finished for PR 21103 at commit
|
|
retest this please |
|
Test build #93870 has finished for PR 21103 at commit
|
|
thanks, merging to master! |
## What changes were proposed in this pull request? This PR refactors `ArrayUnion` based on [this suggestion](#21103 (comment)). 1. Generate optimized code for all of the primitive types except `boolean` 1. Generate code using `ArrayBuilder` or `ArrayBuffer` 1. Leave only a generic path in the interpreted path ## How was this patch tested? Existing tests Author: Kazuaki Ishizaki <[email protected]> Closes #21937 from kiszk/SPARK-23914-follow.
What changes were proposed in this pull request?
The PR adds the SQL function
array_except. The behavior of the function is based on Presto's one.This function returns returns an array of the elements in array1 but not in array2.
Note: The order of elements in the result is not defined.
How was this patch tested?
Added UTs.