Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 21, 2020

What changes were proposed in this pull request?

  1. Pre-process partition specs in ResolvePartitionSpec, and convert partition names according to the partition schema and the SQL config spark.sql.caseSensitive. In the PR, I propose to invoke normalizePartitionSpec for that. The function is used in DSv1 commands, so, the behavior will be similar to DSv1.
  2. Move normalizePartitionSpec() from sql/core/.../datasources/PartitioningUtils to sql/catalyst/.../util/PartitioningUtils to use it in Catalyst's rule ResolvePartitionSpec

Why are the changes needed?

DSv1 commands like ALTER TABLE .. ADD PARTITION and ALTER TABLE .. DROP PARTITION respect the SQL config spark.sql.caseSensitive while resolving partition specs. For example:

spark-sql> CREATE TABLE tbl1 (id bigint, data string) USING parquet PARTITIONED BY (id);
spark-sql> ALTER TABLE tbl1 ADD PARTITION (ID=1);
spark-sql> SHOW PARTITIONS tbl1;
id=1

The same command fails on V2 Table catalog with error:

AnalysisException: Partition key ID not exists

Does this PR introduce any user-facing change?

Yes. After the changes, partition spec resolution works as for DSv1 (without the exception showed above).

How was this patch tested?

By running AlterTablePartitionV2SQLSuite.

@github-actions github-actions bot added the SQL label Nov 21, 2020
@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Test build #131479 has finished for PR 30454 at commit 7269739.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Test build #131480 has started for PR 30454 at commit 046f0d1.

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36085/

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36085/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @MaxGekk . According to JIRA, this is only for 3.1.0, right?

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36086/

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36086/

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2020

According to JIRA, this is only for 3.1.0, right?

@dongjoon-hyun Yes, 3.0 doesn't support V2 ALTER TABLE ... PARTITION and doesn't have codes for V2 partition specs resolution.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 23, 2020

@HyukjinKwon @cloud-fan Could you look at this PR, please.

import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.analysis.Resolver

object PartitioningUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private[sql]

@cloud-fan
Copy link
Contributor

GA passed, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants