-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15985][SQL] Eliminate redundant cast from an array without null or a map without null #13704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #60636 has finished for PR 13704 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we ensure it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that here is a part to generate code for Java primitive arrays regarding from and to. Since Java primitive array (e.g. int[]) cannot have null value unlike SQL, I said "ensure not null in input and output arrays."
What do you think?
|
can you also put the plan tree of the example program in PR description? thanks! |
|
@cloud-fan I updated the PR description by adding plan trees. |
|
I'm wondering why we have this |
|
@cloud-fan I think |
|
Sorry I should say it more explicitly: |
|
I agree that What do you think? |
|
From the plan tree given by you, the |
|
Regarding the plan tree printout (I removed my debug information), the Regarding the generated code, we seems to be on the same page. What you said is not done in |
This is reasonable, as it needs to take care of null elements. And we do have a chance to optimize it: if the target array type's element type is primitive and the input array type's element nullability is false, we can avoid using |
|
Let me check which code portion inserts
I agree. This PR generates optimized code without using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure the input array's element nullability is false, but primitive type array doesn't guarantee it. e.g. we can have ArrayType(ByteType, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. The latest code also checks ArrayType.containsNull of from and to.
|
@cloud-fan, I checked the following
IIUC, this code inserts the corresponding |
|
Test build #61629 has finished for PR 13704 at commit
|
|
Test build #61630 has finished for PR 13704 at commit
|
|
Test build #61677 has finished for PR 13704 at commit
|
|
@cloud-fan regarding a |
|
Test build #62050 has finished for PR 13704 at commit
|
|
Test build #62057 has finished for PR 13704 at commit
|
|
@cloud-fan could you please review this? As you pointed, I also changed code related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we move this into another PR? I think the main purpose of this PR is to eliminate the unnecessary Cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will create another PR and update the description.
|
Test build #62071 has finished for PR 13704 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
case c @ Cast(e, dataType) => (e.dataType, dataType) match {
case (ArrayType(from, false), ArrayType(to, true)) if from == to => e
case (MapType(fromKey, fromValue, false), ArrayType(toKey, toValue, true)) if fromKey == toKey && fromValue == toValue => e
case _ => c
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I like this simple one
|
Test build #62084 has finished for PR 13704 at commit
|
|
Test build #62087 has finished for PR 13704 at commit
|
|
Test build #62097 has finished for PR 13704 at commit
|
|
Test build #64594 has finished for PR 13704 at commit
|
|
@liancheng Could you please review this since I resolved conflict? |
| comparePlans(optimized, expected) | ||
| } | ||
|
|
||
| test("non-nullable to nullable array cast") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-nullable element array to nullable element array cast
|
left some comment, let's go ahead and merge it after that :) |
|
Test build #64658 has finished for PR 13704 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR eliminates redundant cast from an
ArrayTypewithcontainsNull = falseor aMapTypewithcontainsNull = false.For example, in
ArrayTypecase, current implementation leaves a castcast(value#63 as array<double>).toDoubleArray. However, we can eliminatecast(value#63 as array<double>)if we knowvalue#63does not includenull. This PR apply this elimination forArrayTypeandMapTypeinSimplifyCastsat a plan optimization phase.In summary, we got 1.2-1.3x performance improvements over the code before applying this PR.
Here are performance results of benchmark programs:
An example program that originally caused this performance issue.
Plans before this PR
Plans after this PR
How was this patch tested?
Tested by new test cases in
SimplifyCastsSuite