Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Oct 13, 2016

What changes were proposed in this pull request?

This PR prepares a new implementation OnHeapUnsafeColumnarVector that is optimized for reading data from a Unsafe related data structure (e.g. UnsafeArrayData or UnsafeMapData).

Current implementations of ColumnarVector are OnHeapColumnarVector and OffHeapColumnarVector, which are optimized for reading data from Parquet. When they get an array, an map, or a struct stored in a Unsafe related data structure, operations in OnHeapColumnarVector and OffHeapColumnarVector lead to additional copy operations or data conversion. The OnHeapUnsafeColumnarVector only requires a simple memory copy and keep data in an byte array.

OnHeapUnsafeColumnarVector can compress/decompress stored data by using CompressionCodec with spark.sql.inMemoryColumnarStorage.compression.codec property (default is lz4).

This PR is a part of #15219. This is an component independent of others. For ease of review, this PR only introduces OnHeapUnsafeColumnarVector

How was this patch tested?

Perform existing tests for OnHeapUnsafeColumnVector

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66901 has finished for PR 15468 at commit 8b38531.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66902 has finished for PR 15468 at commit 9a57147.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66904 has finished for PR 15468 at commit 870c644.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk kiszk force-pushed the columnarcolumnvector branch from 870c644 to f6dc8ac Compare October 13, 2016 23:54
@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66920 has finished for PR 15468 at commit f6dc8ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class OnHeapUnsafeColumnVector extends ColumnVector implements Serializable

@kiszk
Copy link
Member Author

kiszk commented Oct 17, 2016

@davies, could you please review this at first if #15219 is too big?
cc @tedyu, @sameeragarwal

@kiszk
Copy link
Member Author

kiszk commented Dec 6, 2016

@andrewor14 would it be possible to review this?

@kiszk
Copy link
Member Author

kiszk commented Jan 18, 2017

@sameeragarwal, would it be possible to review this, too?

@kiszk
Copy link
Member Author

kiszk commented Jan 23, 2017

ping @sameeragarwal

@kiszk
Copy link
Member Author

kiszk commented Jan 30, 2017

@sameeragarwal, could you please take a look at this?

@kiszk kiszk force-pushed the columnarcolumnvector branch from f6dc8ac to de17650 Compare February 14, 2017 09:07
@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72864 has finished for PR 15468 at commit de17650.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class OnHeapUnsafeColumnVector extends ColumnVector implements Serializable

@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72886 has finished for PR 15468 at commit 80a5022.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Feb 15, 2017

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 15, 2017

Test build #72939 has finished for PR 15468 at commit 80a5022.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Feb 16, 2017

@sameeragarwal would it be possible to review this since I resolved a conflict?

@kiszk
Copy link
Member Author

kiszk commented Mar 6, 2017

ping @sameeragarwal

@kiszk
Copy link
Member Author

kiszk commented May 21, 2017

I close this since #18014 and #18033 enable the same feature.

@kiszk kiszk closed this May 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants