-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10993] [SQL] Inital code generated encoder for product types #9019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, probably too late to change this now :)
|
Test build #43376 has finished for PR 9019 at commit
|
|
Test build #43378 has finished for PR 9019 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we use a LambdaVariable to let genFunction have a way to access every element of the input data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is how we link whatever code the inner expression generates into the loop.
|
Overall looks good! Left a few clarification questions. |
|
Since this is the infrastructural work of encoder/decoder, let's merge this and use a follow-up pr to address the comments. So, other people can start to play with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe extend LeafExpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you remind me that we have Unevaluable, which is used for expressions that do not support code-gen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a LeafExpression as there are children (fixed in the followup). Its not Unevaluable, but instead only supports codegen.
This PR is a first cut at code generating an encoder that takes a Scala
Producttype and converts it directly into the tungsten binary format. This is done through the addition of a new set of expression that can be used to invoke methods on raw JVM objects, extracting fields and converting the result into the required format. These can then be used directly in anUnsafeProjectionallowing us to leverage the existing encoding logic.According to some simple benchmarks, this can significantly speed up conversion (~4x). However, replacing CatalystConverters is deferred to a later PR to keep this PR at a reasonable size.
Results: