-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11691][SQL] Allow to specify compression codec in HadoopFsRela… #9657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8d151b5
3202474
f6c0074
d470aea
f47d8cc
ea70b40
a8ed421
3645dbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,7 @@ import scala.language.{existentials, implicitConversions} | |
| import scala.util.{Failure, Success, Try} | ||
|
|
||
| import org.apache.hadoop.fs.Path | ||
| import org.apache.hadoop.io.compress.{CompressionCodec, CompressionCodecFactory} | ||
| import org.apache.hadoop.util.StringUtils | ||
|
|
||
| import org.apache.spark.Logging | ||
|
|
@@ -253,11 +254,18 @@ object ResolvedDataSource extends Logging { | |
| // For partitioned relation r, r.schema's column ordering can be different from the column | ||
| // ordering of data.logicalPlan (partition columns are all moved after data column). This | ||
| // will be adjusted within InsertIntoHadoopFsRelation. | ||
|
|
||
| val codec = options.get("compression.codec").flatMap(e => | ||
| Some(new CompressionCodecFactory(sqlContext.sparkContext.hadoopConfiguration) | ||
| .getCodecClassByName(e).asInstanceOf[Class[CompressionCodec]]) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need error handling if an unknown codec name is given.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since user can only set compression through DataFrameWriter.compress(codec), it is very unlikely to set a unknown codec name. Even user set a unknown codec, I think it is fine to just throw exception on driver side.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Users can easily set unknown codecs via
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. they could, but we didn't expose the compression property to user. So user don't know what property to set for compression. |
||
| ) | ||
|
|
||
| sqlContext.executePlan( | ||
| InsertIntoHadoopFsRelation( | ||
| r, | ||
| data.logicalPlan, | ||
| mode)).toRdd | ||
| mode, | ||
| codec)).toRdd | ||
| r | ||
| case _ => | ||
| sys.error(s"${clazz.getCanonicalName} does not allow create table as select.") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be a normal option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we shouldn't depend on Hadoop APIs in options, which is a user facing API. Nobody outside the Hadoop world knows how to use the CompressionCodec API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.