-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14270][SQL] whole stage codegen support for typed filter #12061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #54510 has finished for PR 12061 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it involves extra deserialization step, will it get performance benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we need to deserialize input row, call function and serialize to row, but now we don't need to do the final serialization, it should be faster even without whole stage codegen.
be182a9 to
c5b1fc3
Compare
|
Test build #54520 has finished for PR 12061 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a risky thing because as soon as we introduce https://issues.apache.org/jira/browse/SPARK-14083 this test will be useless.
maybe we should introduce a config option now and then explicitly turn off that future optimization
cc @JoshRosen - any good names for the config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could have a special function wrapper which makes the code un-expression-convertable.
aa95fd6 to
bf9f5b5
Compare
|
Test build #54593 has finished for PR 12061 at commit
|
|
Test build #54600 has finished for PR 12061 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detected style violation.
Suggested improvement:
override lazy val resolved: Boolean = {
// If the class to construct is an inner class, we need to get its outer pointer, or this
// expression should be regarded as unresolved.
// Note that static inner classes (e.g., inner classes within Scala objects) don't need
// outer pointer registration.
val innerStaticClass = outerPointer.isEmpty && cls.isMemberClass && !Modifier.isStatic(cls.getModifiers)
childrenResolved && !innerStaticClass
}
bf9f5b5 to
892bdd3
Compare
|
Test build #54957 has finished for PR 12061 at commit
|
…bjectOperator ## What changes were proposed in this pull request? This PR decouples deserializer expression resolution from `ObjectOperator`, so that we can use deserializer expression in normal operators. This is needed by #12061 and #12067 , I abstracted the logic out and put them in this PR to reduce code change in the future. ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes #12131 from cloud-fan/separate.
892bdd3 to
98744f0
Compare
|
Test build #55087 has finished for PR 12061 at commit
|
|
generated code for a single filter is: for back-to-back filters is: |
|
The benchmark result for master branch is: The whole stage version is about 30% faster. |
92af545 to
0fcaa06
Compare
|
Test build #55109 has finished for PR 12061 at commit
|
|
cc @davies |
|
Test build #55110 has finished for PR 12061 at commit
|
|
LGTM. |
|
Test build #55172 has finished for PR 12061 at commit
|
|
cc @marmbrus , do you have time to take a look? thx! |
|
Test build #55177 has finished for PR 12061 at commit
|
|
retest this please |
|
Test build #55187 has finished for PR 12061 at commit
|
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
We implement typed filter by
MapPartitions, which doesn't work well with whole stage codegen. This PR useFilterto implement typed filter and we can get the whole stage codegen support for free.This PR also introduced
DeserializeToObjectandSerializeFromObject, to seperate serialization logic from object operator, so that it's eaiser to write optimization rules for adjacent object operators.How was this patch tested?
existing tests.