[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext #2039

chutium · 2014-08-19T19:47:59Z

There are 4 different compression codec available for ParquetOutputFormat

in Spark SQL, it was set as a hard-coded value in ParquetRelation.defaultCompression

original discuss:
#195 (diff)

i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0)

btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632).

…ting ParquetFile in SQLContext

SparkQA · 2014-08-19T19:50:24Z

QA tests have started for PR 2039 at commit 21235dc.

This patch merges cleanly.

SparkQA · 2014-08-19T21:00:52Z

QA tests have finished for PR 2039 at commit 21235dc.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

chutium · 2014-08-19T22:18:06Z

it seems failed test cases are not related to this PR, all of them are in sql/hive-thriftserver modul

marmbrus · 2014-08-26T00:54:45Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

Perhaps spark.sql.parquet.compression.codec to be closer to the related setting in Spark core?

marmbrus · 2014-08-26T01:44:04Z

Jenkins, test this please.

SparkQA · 2014-08-26T01:50:54Z

QA tests have started for PR 2039 at commit 21235dc.

This patch merges cleanly.

SparkQA · 2014-08-26T03:50:55Z

Tests timed out after a configured wait of 120m.

…codec set to snappy

chutium · 2014-08-26T11:51:24Z

thanks @marmbrus , i change the property name and default codec

…so in test suite

SparkQA · 2014-08-26T11:56:06Z

QA tests have started for PR 2039 at commit 2f44964.

This patch merges cleanly.

SparkQA · 2014-08-26T13:15:15Z

QA tests have finished for PR 2039 at commit 2f44964.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2014-08-26T18:52:17Z

Thanks! I've merged this into master and 1.1.

…ting ParquetFile in SQLContext There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: #195 (diff) i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <[email protected]> Closes #2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext (cherry picked from commit 8856c3d) Signed-off-by: Michael Armbrust <[email protected]>

…ting ParquetFile in SQLContext There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: apache#195 (diff) i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 apache#1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, apache#1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <[email protected]> Closes apache#2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext

[SPARK-3131][SQL] Allow user to set parquet compression codec for wri…

21235dc

…ting ParquetFile in SQLContext

marmbrus reviewed Aug 26, 2014
View reviewed changes

[SPARK-3131][SQL] compression codec config property name and default …

e578e21

…codec set to snappy

[SPARK-3131][SQL] parquet compression default codec set to snappy, al…

2f44964

…so in test suite

asfgit closed this in 8856c3d Aug 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext #2039

[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext #2039

Uh oh!

chutium commented Aug 19, 2014

Uh oh!

SparkQA commented Aug 19, 2014

Uh oh!

SparkQA commented Aug 19, 2014

Uh oh!

chutium commented Aug 19, 2014

Uh oh!

marmbrus Aug 26, 2014

Uh oh!

marmbrus commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

chutium commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

marmbrus commented Aug 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext #2039

[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext #2039

Uh oh!

Conversation

chutium commented Aug 19, 2014

Uh oh!

SparkQA commented Aug 19, 2014

Uh oh!

SparkQA commented Aug 19, 2014

Uh oh!

chutium commented Aug 19, 2014

Uh oh!

marmbrus Aug 26, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

chutium commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

SparkQA commented Aug 26, 2014

Uh oh!

marmbrus commented Aug 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants