-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17915][SQL] Prepare a new ColumnVector implementation for UnsafeData #15468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #66901 has finished for PR 15468 at commit
|
|
Test build #66902 has finished for PR 15468 at commit
|
|
Test build #66904 has finished for PR 15468 at commit
|
870c644 to
f6dc8ac
Compare
|
Test build #66920 has finished for PR 15468 at commit
|
|
@davies, could you please review this at first if #15219 is too big? |
|
@andrewor14 would it be possible to review this? |
|
@sameeragarwal, would it be possible to review this, too? |
|
ping @sameeragarwal |
|
@sameeragarwal, could you please take a look at this? |
f6dc8ac to
de17650
Compare
|
Test build #72864 has finished for PR 15468 at commit
|
|
Test build #72886 has finished for PR 15468 at commit
|
|
Jenkins, retest this please |
|
Test build #72939 has finished for PR 15468 at commit
|
|
@sameeragarwal would it be possible to review this since I resolved a conflict? |
|
ping @sameeragarwal |
What changes were proposed in this pull request?
This PR prepares a new implementation
OnHeapUnsafeColumnarVectorthat is optimized for reading data from aUnsaferelated data structure (e.g.UnsafeArrayDataorUnsafeMapData).Current implementations of
ColumnarVectorareOnHeapColumnarVectorandOffHeapColumnarVector, which are optimized for reading data from Parquet. When they get an array, an map, or a struct stored in aUnsaferelated data structure, operations inOnHeapColumnarVectorandOffHeapColumnarVectorlead to additional copy operations or data conversion. TheOnHeapUnsafeColumnarVectoronly requires a simple memory copy and keep data in anbytearray.OnHeapUnsafeColumnarVectorcan compress/decompress stored data by usingCompressionCodecwithspark.sql.inMemoryColumnarStorage.compression.codecproperty (default is lz4).This PR is a part of #15219. This is an component independent of others. For ease of review, this PR only introduces
OnHeapUnsafeColumnarVectorHow was this patch tested?
Perform existing tests for
OnHeapUnsafeColumnVector