-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-3928] [SPARK-5182] [SQL] Partitioning support for the data sources API #5526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
7dd8dd5
Adds new interfaces and stub methods for data sources API partitionin…
liancheng 012ed2d
Adds PartitioningOptions
liancheng aa8ba9a
Javadoc fix
liancheng 1b8231f
Renames FSBasedPrunedFilteredScan to FSBasedRelation
liancheng 3ba9bbf
Adds DataFrame.saveAsTable() overrides which support partitioning
liancheng 770b5ba
Adds tests for FSBasedRelation
liancheng 95d0b4d
Renames PartitionedSchemaRelationProvider to FSBasedRelationProvider
liancheng 5de194a
Forgot Apache licence header
liancheng 9d17607
Adds the contract that OutputWriter should have zero-arg constructor
liancheng fb5a607
Fixes compilation error
liancheng 3c5073a
Fixes SaveModes used in test cases
liancheng 327bb1d
Implements partitioning support for data sources API
liancheng ea6c8dd
Removes remote debugging stuff
liancheng 9b487bf
Fixes compilation errors introduced while rebasing
liancheng b746ab5
More tests
liancheng f18dec2
More strict schema checking
liancheng ca1805b
Removes duplicated partition discovery code in new Parquet
liancheng 8d2ff71
Merges partition columns when reading partitioned relations
liancheng ce52353
Adds new SQLContext.load() overload with user defined dynamic partiti…
liancheng 422ff4a
Fixes style issue
liancheng f320766
Adds prepareForWrite() hook, refactored writer containers
liancheng 0bc6ad1
Resorts to new Hadoop API, and now FSBasedRelation can customize outp…
liancheng be0c268
Uses TaskAttempContext rather than Configuration in OutputWriter.init
liancheng 54c3d7b
Enforces that FileOutputFormat must be used
liancheng 5f423d3
Bug fixes. Lets data source to customize OutputCommitter rather than …
liancheng a29e663
Bug fix: should only pass actuall data files to FSBaseRelation.buildScan
liancheng c4ed4fe
Bug fixes and a new test suite
liancheng 51be443
Replaces FSBasedRelation.outputCommitterClass with FSBasedRelation.pr…
liancheng 5849dd0
Fixes doc typos. Fixes partition discovery refresh.
liancheng fa543f3
Addresses comments
liancheng 0b8cd70
Adds Scala/Catalyst row conversion when writing non-partitioned tables
liancheng 795920a
Fixes compilation error after rebasing
liancheng bc3f9b4
Uses projection to separate partition columns and data columns while …
liancheng c466de6
Addresses comments
liancheng 52b0c9b
Adjusts project/MimaExclude.scala
liancheng 0349e09
Fixes compilation error introduced while rebasing
liancheng 7552168
Fixes typo in MimaExclude.scala
liancheng c71ac6c
Addresses comments from @marmbrus
liancheng 8d12e69
Fixes compilation error
liancheng ad4d4de
Enables HDFS style globbing
liancheng 348a922
Adds projection in FSBasedRelation.buildScan(requiredColumns, inputPa…
liancheng edf49e7
Removed commented stale code block
liancheng 43ba50e
Avoids serializing generated projection code
liancheng 1f9b1a5
Tweaks data schema passed to FSBasedRelations
liancheng 5351a1b
Fixes compilation error introduced while rebasing
liancheng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -762,7 +762,7 @@ class SQLContext(@transient val sparkContext: SparkContext) | |
| */ | ||
| @Experimental | ||
| def load(source: String, options: Map[String, String]): DataFrame = { | ||
| val resolved = ResolvedDataSource(this, None, source, options) | ||
| val resolved = ResolvedDataSource(this, None, Array.empty[String], source, options) | ||
| DataFrame(this, LogicalRelation(resolved.relation)) | ||
| } | ||
|
|
||
|
|
@@ -781,6 +781,37 @@ class SQLContext(@transient val sparkContext: SparkContext) | |
| load(source, schema, options.toMap) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * (Java-specific) Returns the dataset specified by the given data source and | ||
| * a set of options as a DataFrame, using the given schema as the schema of the DataFrame. | ||
| * | ||
| * @group genericdata | ||
| */ | ||
| @Experimental | ||
| def load( | ||
| source: String, | ||
| schema: StructType, | ||
| partitionColumns: Array[String], | ||
| options: java.util.Map[String, String]): DataFrame = { | ||
| load(source, schema, partitionColumns, options.toMap) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * (Scala-specific) Returns the dataset specified by the given data source and | ||
| * a set of options as a DataFrame, using the given schema as the schema of the DataFrame. | ||
| * @group genericdata | ||
| */ | ||
| @Experimental | ||
| def load( | ||
| source: String, | ||
| schema: StructType, | ||
| options: Map[String, String]): DataFrame = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem related?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I see, its just a weird diff since the other function below now has partition columns. |
||
| val resolved = ResolvedDataSource(this, Some(schema), Array.empty[String], source, options) | ||
| DataFrame(this, LogicalRelation(resolved.relation)) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * (Scala-specific) Returns the dataset specified by the given data source and | ||
|
|
@@ -791,8 +822,9 @@ class SQLContext(@transient val sparkContext: SparkContext) | |
| def load( | ||
| source: String, | ||
| schema: StructType, | ||
| partitionColumns: Array[String], | ||
| options: Map[String, String]): DataFrame = { | ||
| val resolved = ResolvedDataSource(this, Some(schema), source, options) | ||
| val resolved = ResolvedDataSource(this, Some(schema), partitionColumns, source, options) | ||
| DataFrame(this, LogicalRelation(resolved.relation)) | ||
| } | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value of this setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to
true.