-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16215][SQL] Reduce runtime overhead of a program that writes an primitive array in Dataframe/Dataset #13911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #61256 has finished for PR 13911 at commit
|
|
Test build #61262 has finished for PR 13911 at commit
|
|
Test build #61651 has finished for PR 13911 at commit
|
|
Test build #62312 has finished for PR 13911 at commit
|
|
Test build #68066 has finished for PR 13911 at commit
|
|
Test build #68073 has finished for PR 13911 at commit
|
|
Test build #68140 has finished for PR 13911 at commit
|
|
This was implemented by another approach in #15044 |
What changes were proposed in this pull request?
This PR optimize generate code of projection for an primitive type array. While we know primitive type array does not require null check and has contigious data region, current generated code performs null checks and performs copy for each element (at Lines 075-082 at Generated code before applying this PR)
Platform.copyGenericArrayDatawhen [SPARK-16043][SQL] Prepare GenericArrayData implementation specialized for a primitive array #13758 is mergedUnsafeArrayDatawhen [SPARK-15962][SQL] Introduce implementation with a dense format for UnsafeArrayData #13680 is mergedThey are done in a helper method
UnsafeArrayWrite.writePrimitive<PrimitiveType>Array()(at Line 075 at Generated code after applying this PR).For now, 3 and 4 are not currently enabled. But, code are ready.
Benchmark program
An example program
Generated code before applying this PR
Generated code after applying this PR
How was this patch tested?
Added test suites into
DataFrameComplexTypeSuite