[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables #15814

ericl · 2016-11-08T20:02:05Z

What changes were proposed in this pull request?

As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations.

This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows

During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore.
The planner identifies any partitions with custom locations and includes this in the write task metadata.
FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output.
When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions.

It was necessary to introduce a method for staging files with absolute output paths to FileCommitProtocol. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits.

The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present.

cc @cloud-fan @yhuai

How was this patch tested?

Unit tests, existing tests.

ericl · 2016-11-08T20:09:19Z

core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala

+  /**
+   * Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
+   * Depending on the implementation, there may be weaker guarantees around adding files this way.
+   */


Do we want a default implementation? If the protocol doesn't implement this things will go seriously wrong at runtime wouldn't it?

Can you unify this and newTaskTempFile? If we treat the default partition location like custom location.

I thought about combining it but I think the method semantics become too subtle then.

rxin · 2016-11-08T21:05:24Z

Can you change the title to indicate that this is a serious bug fix? It sounds like a new feature but in reality it is a serious behavior bug.

rxin · 2016-11-08T21:09:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

        val plan =
          InsertIntoHadoopFsRelationCommand(
            outputPath,
+            Map.empty,


let's change the call to use named arguments for all the arguments here.

rxin · 2016-11-08T21:09:30Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala

    }
  }
+
+  test("insert into and overwrite new datasource tables with partial specs and custom locs") {


can we break this into multiple test cases?

rxin · 2016-11-08T21:50:03Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+   */
+  @transient private var addedAbsPathFiles: mutable.Map[String, String] = null
+
+  private def absPathStagingDir: Path = new Path(path, "_temporary-" + jobId)


document this

rxin · 2016-11-08T21:52:31Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+   * Tracks files staged by this task for absolute output paths. These outputs are not managed by
+   * the Hadoop OutputCommitter, so we must move these to their final locations on job commit.
+   */
+  @transient private var addedAbsPathFiles: mutable.Map[String, String] = null


we should document whether the strings are path to files, or just path to directories. I think they are just directories right? The naming suggests that they are files.

They are files. We need to track the unique output location of each file here in order to know where to place it. We could use directories, but they would end up with one file each anyways.

rxin · 2016-11-08T21:57:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-      } else {
-        None
-      }
+    val staticPartitionKeys = partitionKeys.filter(_._2.nonEmpty).map(t => (t._1, t._2.get))


i find the _1 and _2s here impossible to read. Can we add an explicit type to val partitionKeys ?

rxin · 2016-11-08T21:59:41Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

 *
 * @param enabled whether to overwrite existing data in the table.
- * @param specificPartition only data in the specified partition will be overwritten.
+ * @param staticPartitionKeys if non-empty, specifies that we only want to overwrite partitions


can the partition spec by partial? we should document that here.

Yep it is in the next sentence.

rxin · 2016-11-08T22:02:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

+      // When partitions are tracked by the catalog, compute all custom partition locations that
+      // may be relevant to the insertion job.
+      if (partitionsTrackedByCatalog) {
+        val matchingPartitions = t.sparkSession.sessionState.catalog.listPartitions(


I'd create a new API in external catalog api and push this into it. In the future we can look into how to optimize this.

You also need the set of matching partitions (including those with default locations) in order to determine which ones to delete at the end of an overwrite call.

This makes the optimization quite messy, so I'd rather not push it to the catalog for now.

SparkQA · 2016-11-08T22:27:16Z

Test build #68360 has finished for PR 15814 at commit b5bbb1d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-09T05:38:15Z

Test build #68385 has finished for PR 15814 at commit fbd7b42.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-11-09T05:49:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

+        "/" + PartitioningUtils.getPathFragment(p.spec, table.partitionSchema)).toString
+      val catalogLocation = new Path(p.storage.locationUri.get).makeQualified(
+        fs.getUri, fs.getWorkingDirectory).toString
+      if (catalogLocation != defaultLocation) {


Why we distinguish partition locations that equal to default location? Partitions always have locations(custom specified or generated by default), do we really need to care about who set it?

The only purpose here is to optimize the common case where the directory scheme is followed. Othewise, we have to broadcast all the partition locations even if they are using the default.

SparkQA · 2016-11-09T05:56:53Z

Test build #68387 has finished for PR 15814 at commit 4296612.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-09T06:04:10Z

Test build #68388 has finished for PR 15814 at commit 91f87de.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-11-09T06:20:41Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

    SparkHadoopMapRedUtil.commitTask(
      committer, taskContext, attemptId.getJobID.getId, attemptId.getTaskID.getId)
-    EmptyTaskCommitMessage
+    new TaskCommitMessage(addedAbsPathFiles.toMap)


Why don't we just rename temp files to dest files in commitTask?

Yea it can go either way. Unclear which one is better. Renaming on job commit gives higher chance of corrupting data, whereas renaming in task commit is slightly more performant.

I'd prefer renaming in task commit.

viirya · 2016-11-09T08:14:31Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

+    }
+    // first clear the path determined by the static partition keys (e.g. /table/foo=1)
+    val staticPrefixPath = qualifiedOutputPath.suffix(staticPartitionPrefix)
+    if (fs.exists(staticPrefixPath) && !fs.delete(staticPrefixPath, true /* recursively */)) {


Do we want the behavior to delete the partitions matching the static prefix?

According to Hive manual at https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert:

FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

When there are already non-empty partitions exists for the dynamic partition columns, (for example, country='CA' exists under some ds root partition), it will be overwritten if the dynamic partition insert saw the same value (say 'CA') in the input data. This is in line with the 'insert overwrite' semantics. However, if the partition value 'CA' does not appear in the input data, the existing partition will not be overwritten.

Thus when the static prefix is dt=2008-06-08, we should only overwrite the partitions with country=CA if the input data only contains country=CA. If we delete all partitions belonging to the static prefix, we delete other partitions such as country=US, country=UK, etc. too.

Do I understand it correctly here?

Oh rats. These semantics might be a little difficult to implement, but perhaps we can use the loadDynamicPartitions call here after writing to a dummy location. I'll take a look tomorrow.

Just discussed this with @yhuai and it might be ok to keep this behavior if we carefully document it. Implementing the Hive behavior with the file commit protocol is tricky and might be too much to get into 2.1.

cc @rxin

I personally find the hive behavior pretty weird.

OK for me if we document it.

SparkQA · 2016-11-09T23:57:57Z

Test build #68423 has finished for PR 15814 at commit 81bb266.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaInteractionExample
- class GBTClassifierWrapperWriter(instance: GBTClassifierWrapper)
- class GBTClassifierWrapperReader extends MLReader[GBTClassifierWrapper]
- class GBTRegressorWrapperWriter(instance: GBTRegressorWrapper)
- class GBTRegressorWrapperReader extends MLReader[GBTRegressorWrapper]

ericl · 2016-11-10T00:23:00Z

I've temporarily merged in the changes from #15797 to see if the tests will pass.

SparkQA · 2016-11-10T03:03:30Z

Test build #68431 has finished for PR 15814 at commit 51c1322.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

SparkQA · 2016-11-11T00:38:39Z

Test build #68492 has finished for PR 15814 at commit 63f7f2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-11-11T00:59:55Z

Merging in master/branch-2.1. Thanks.

…e tables ## What changes were proposed in this pull request? As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations. This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows - During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore. - The planner identifies any partitions with custom locations and includes this in the write task metadata. - FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output. - When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions. It was necessary to introduce a method for staging files with absolute output paths to `FileCommitProtocol`. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits. The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present. cc cloud-fan yhuai ## How was this patch tested? Unit tests, existing tests. Author: Eric Liang <[email protected]> Author: Wenchen Fan <[email protected]> Closes #15814 from ericl/sc-5027. (cherry picked from commit a335634) Signed-off-by: Reynold Xin <[email protected]>

…e tables ## What changes were proposed in this pull request? As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations. This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows - During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore. - The planner identifies any partitions with custom locations and includes this in the write task metadata. - FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output. - When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions. It was necessary to introduce a method for staging files with absolute output paths to `FileCommitProtocol`. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits. The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present. cc cloud-fan yhuai ## How was this patch tested? Unit tests, existing tests. Author: Eric Liang <[email protected]> Author: Wenchen Fan <[email protected]> Closes apache#15814 from ericl/sc-5027.

ericl and others added 18 commits November 4, 2016 19:03

Fri Nov 4 19:03:57 PDT 2016

6b6273d

Fri Nov 4 19:42:34 PDT 2016

40f4368

Fri Nov 4 19:56:49 PDT 2016

3c8b43e

Merge branch 'master' into sc-5027

38e09f4

Mon Nov 7 15:11:38 PST 2016

fb7ba10

Mon Nov 7 15:22:33 PST 2016

3318970

correct several partition related behaviours of ExternalCatalog

aa2536f

more test

bbe1e12

Mon Nov 7 19:25:37 PST 2016

a905b09

Mon Nov 7 19:28:49 PST 2016

3c43fa5

Mon Nov 7 19:34:38 PST 2016

13aa481

Mon Nov 7 20:02:13 PST 2016

37afbb1

use Path

dddee47

address comments

e97c17f

fix bug

f85bb27

Tue Nov 8 11:29:45 PST 2016

6dc4cb5

Tue Nov 8 11:45:44 PST 2016

d68d1db

Tue Nov 8 11:50:51 PST 2016

b5bbb1d

ericl commented Nov 8, 2016

View reviewed changes

rxin reviewed Nov 8, 2016

View reviewed changes

ericl changed the title ~~[SPARK-18185] Support all forms of INSERT / OVERWRITE TABLE for Datasource tables~~ [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables Nov 8, 2016

work around

d939b37

ericl and others added 6 commits November 8, 2016 19:42

Tue Nov 8 19:42:53 PST 2016

fae929e

Tue Nov 8 19:43:30 PST 2016

fbd7b42

Tue Nov 8 20:04:12 PST 2016

4296612

Tue Nov 8 20:09:31 PST 2016

001cb1d

Tue Nov 8 20:11:02 PST 2016

91f87de

fix test

9dbc3f1

cloud-fan reviewed Nov 9, 2016

View reviewed changes

viirya reviewed Nov 9, 2016

View reviewed changes

ericl added 4 commits November 9, 2016 14:10

Merge branch 'master' into sc-5027

81bb266

upper case the test

8e10ff7

Merge remote-tracking branch 'github/pr/15797' into sc-5027

1f090b1

merge case sensitivity fixes

1500566

ericl mentioned this pull request Nov 9, 2016

[SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog #15797

Closed

fix crash in commit protocol

51c1322

asfgit closed this in a335634 Nov 11, 2016

zhengchenyu mentioned this pull request Oct 14, 2025

[SPARK-37210][CORE][SQL] Allow forced use of staging directory #37346

Closed

[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables #15814

[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables #15814

Uh oh!

Conversation

ericl commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 8, 2016

Uh oh!

SparkQA commented Nov 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 9, 2016

Uh oh!

SparkQA commented Nov 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 9, 2016

Uh oh!

ericl commented Nov 8, 2016 •

edited

Loading

rxin commented Nov 8, 2016 •

edited

Loading