-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC #7444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -267,6 +267,15 @@ class DataFrameReader private[sql](sqlContext: SQLContext) { | |
| } | ||
| } | ||
|
|
||
| /** | ||
| * Loads an ORC file and returns the result as a [[DataFrame]]. | ||
| * | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Also add examples like you did at DataFrameWriter?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The one in |
||
| * @param path input path | ||
| * @since 1.5.0 | ||
| * @note Currently, this method can only be used together with `HiveContext`. | ||
| */ | ||
| def orc(path: String): DataFrame = format("orc").load(path) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Multiple paths support?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did consider this but didn't add multiple-path support in this PR intentionally. The problem here is that currently we can't specify multiple paths via data source options, while DataFrame reader/writer API relies on the "path" option to find the input path. If you check I'm thinking about adding multiple value support for data source options in a more general way. (Maybe just simple comma separated lists with proper comma escaping.) |
||
|
|
||
| /** | ||
| * Returns the specified table as a [[DataFrame]]. | ||
| * | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -280,6 +280,18 @@ final class DataFrameWriter private[sql](df: DataFrame) { | |
| */ | ||
| def parquet(path: String): Unit = format("parquet").save(path) | ||
|
|
||
| /** | ||
| * Saves the content of the [[DataFrame]] in ORC format at the specified path. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you should document this is only available if hive is ...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, right. Thanks for reminding! |
||
| * This is equivalent to: | ||
| * {{{ | ||
| * format("orc").save(path) | ||
| * }}} | ||
| * | ||
| * @since 1.5.0 | ||
| * @note Currently, this method can only be used together with `HiveContext`. | ||
| */ | ||
| def orc(path: String): Unit = format("orc").save(path) | ||
|
|
||
| /////////////////////////////////////////////////////////////////////////////////////// | ||
| // Builder pattern config options | ||
| /////////////////////////////////////////////////////////////////////////////////////// | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the function
parquetsupport multiple path in loading, should the apiorcsupport that also?