- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-29062][SQL] Add V1_BATCH_WRITE to the TableCapabilityChecks #25767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Test build #110494 has finished for PR 25767 at commit  
 | 
| cc @cloud-fan @jose-torres @rdblue Can you ptal? I re-opened this after yesterday's DSV2 sync. | 
| Test build #111008 has finished for PR 25767 at commit  
 | 
| Test build #111010 has finished for PR 25767 at commit  
 | 
| Test build #111024 has finished for PR 25767 at commit  
 | 
| retest this please | 
| Test build #111049 has finished for PR 25767 at commit  
 | 
| import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._ | ||
| provider.getTable(dsOptions) match { | ||
| case table: SupportsWrite if table.supports(BATCH_WRITE) => | ||
| if (partitioningColumns.nonEmpty) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! Even if the format is a TableProvider, we may still fall back to v1. It's better to check the partition when we are really going to do a v2 write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, technically we only need to assert no partition columns for append and overwrite. Since the v2 write here only supports append and overwrite, we can revisit it later.
| retest this please | 
| Test build #111052 has finished for PR 25767 at commit  
 | 
| LGTM, merging to master! | 
What changes were proposed in this pull request?
Currently the checks in the Analyzer require that V2 Tables have BATCH_WRITE defined for all tables that have V1 Write fallbacks. This is confusing as these tables may not have the V2 writer interface implemented yet. This PR adds this table capability to these checks.
In addition, this allows V2 tables to leverage the V1 APIs for DataFrameWriter.save if they do extend the V1_BATCH_WRITE capability. This way, these tables can continue to receive partitioning information and also perform checks for the existence of tables, and support all SaveModes.
Why are the changes needed?
Partitioned saves through DataFrame.write are otherwise broken for V2 tables that support the V1
write API.
Does this PR introduce any user-facing change?
No
How was this patch tested?
V1WriteFallbackSuite