[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767

brkyvz · 2019-09-12T00:46:29Z

What changes were proposed in this pull request?

Currently the checks in the Analyzer require that V2 Tables have BATCH_WRITE defined for all tables that have V1 Write fallbacks. This is confusing as these tables may not have the V2 writer interface implemented yet. This PR adds this table capability to these checks.

In addition, this allows V2 tables to leverage the V1 APIs for DataFrameWriter.save if they do extend the V1_BATCH_WRITE capability. This way, these tables can continue to receive partitioning information and also perform checks for the existence of tables, and support all SaveModes.

Why are the changes needed?

Partitioned saves through DataFrame.write are otherwise broken for V2 tables that support the V1
write API.

Does this PR introduce any user-facing change?

No

How was this patch tested?

V1WriteFallbackSuite

SparkQA · 2019-09-12T03:33:42Z

Test build #110494 has finished for PR 25767 at commit a3f87bf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2019-09-19T16:05:41Z

cc @cloud-fan @jose-torres @rdblue Can you ptal? I re-opened this after yesterday's DSV2 sync.

SparkQA · 2019-09-19T20:13:44Z

Test build #111008 has finished for PR 25767 at commit 39cb1a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-19T20:18:34Z

Test build #111010 has finished for PR 25767 at commit e8b5942.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-19T22:57:27Z

Test build #111024 has finished for PR 25767 at commit cb39676.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2019-09-20T06:44:30Z

retest this please

SparkQA · 2019-09-20T07:05:02Z

Test build #111049 has finished for PR 25767 at commit cb39676.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-09-20T07:09:27Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

      import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
      provider.getTable(dsOptions) match {
        case table: SupportsWrite if table.supports(BATCH_WRITE) =>
+          if (partitioningColumns.nonEmpty) {


good catch! Even if the format is a TableProvider, we may still fall back to v1. It's better to check the partition when we are really going to do a v2 write.

BTW, technically we only need to assert no partition columns for append and overwrite. Since the v2 write here only supports append and overwrite, we can revisit it later.

cloud-fan · 2019-09-20T07:15:06Z

retest this please

SparkQA · 2019-09-20T11:26:26Z

Test build #111052 has finished for PR 25767 at commit cb39676.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-09-20T14:04:45Z

LGTM, merging to master!

brkyvz added 2 commits September 11, 2019 17:29

V1_BATCH_WRITE should also pass BATCH_WRITE checks

ca08d3b

move partitioning check

a3f87bf

brkyvz changed the title ~~[SPARK-29062] Add V1_BATCH_WRITE to the TableCapabilityChecks~~ [SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks Sep 12, 2019

dongjoon-hyun added the SQL label Sep 12, 2019

brkyvz closed this Sep 12, 2019

brkyvz added 4 commits September 12, 2019 12:32

save

0f78347

meeh

a665313

save so far

cef1705

fall back to saveV1Source

39cb1a2

brkyvz reopened this Sep 19, 2019

add more tests

e8b5942

fix mcs

cb39676

cloud-fan reviewed Sep 20, 2019

View reviewed changes

cloud-fan closed this in eb7ee68 Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767

[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767

brkyvz commented Sep 12, 2019 •

edited

Loading

Uh oh!

SparkQA commented Sep 12, 2019

Uh oh!

brkyvz commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

brkyvz commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

cloud-fan Sep 20, 2019

Uh oh!

cloud-fan Sep 20, 2019

Uh oh!

cloud-fan commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

cloud-fan commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767

[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767

Conversation

brkyvz commented Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Sep 12, 2019

Uh oh!

brkyvz commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

brkyvz commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

cloud-fan Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

cloud-fan commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brkyvz commented Sep 12, 2019 •

edited

Loading