Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Sep 12, 2019

What changes were proposed in this pull request?

Currently the checks in the Analyzer require that V2 Tables have BATCH_WRITE defined for all tables that have V1 Write fallbacks. This is confusing as these tables may not have the V2 writer interface implemented yet. This PR adds this table capability to these checks.

In addition, this allows V2 tables to leverage the V1 APIs for DataFrameWriter.save if they do extend the V1_BATCH_WRITE capability. This way, these tables can continue to receive partitioning information and also perform checks for the existence of tables, and support all SaveModes.

Why are the changes needed?

Partitioned saves through DataFrame.write are otherwise broken for V2 tables that support the V1
write API.

Does this PR introduce any user-facing change?

No

How was this patch tested?

V1WriteFallbackSuite

@brkyvz brkyvz changed the title [SPARK-29062] Add V1_BATCH_WRITE to the TableCapabilityChecks [SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks Sep 12, 2019
@SparkQA
Copy link

SparkQA commented Sep 12, 2019

Test build #110494 has finished for PR 25767 at commit a3f87bf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz brkyvz closed this Sep 12, 2019
@brkyvz brkyvz reopened this Sep 19, 2019
@brkyvz
Copy link
Contributor Author

brkyvz commented Sep 19, 2019

cc @cloud-fan @jose-torres @rdblue Can you ptal? I re-opened this after yesterday's DSV2 sync.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #111008 has finished for PR 25767 at commit 39cb1a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #111010 has finished for PR 25767 at commit e8b5942.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #111024 has finished for PR 25767 at commit cb39676.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Sep 20, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111049 has finished for PR 25767 at commit cb39676.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
provider.getTable(dsOptions) match {
case table: SupportsWrite if table.supports(BATCH_WRITE) =>
if (partitioningColumns.nonEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Even if the format is a TableProvider, we may still fall back to v1. It's better to check the partition when we are really going to do a v2 write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, technically we only need to assert no partition columns for append and overwrite. Since the v2 write here only supports append and overwrite, we can revisit it later.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111052 has finished for PR 25767 at commit cb39676.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@cloud-fan cloud-fan closed this in eb7ee68 Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants