Skip to content

Conversation

@liwensun
Copy link
Contributor

@liwensun liwensun commented Jun 3, 2019

What changes were proposed in this pull request?

In PR #24365, we pass in the partitionBy columns as options in DataFrameWriter. To make this change less intrusive for a patch release, we added a feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS with the default to be false.

For 3.0, we should just do the correct behavior for DSV1, i.e., always passing partitionBy as options, and remove this legacy feature flag.

How was this patch tested?

Existing tests.

@liwensun liwensun changed the title [SC-27453][FOLLOW-UP] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SC-27453][FOLLOW-UP][3.0] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default Jun 3, 2019
@liwensun liwensun changed the title [SC-27453][FOLLOW-UP][3.0] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SC-27453][FOLLOW-UP] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default Jun 3, 2019
@liwensun liwensun changed the title [SC-27453][FOLLOW-UP] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SPARK-27453][FOLLOW-UP] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default Jun 3, 2019
@liwensun liwensun changed the title [SPARK-27453][FOLLOW-UP] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SPARK-27938] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default Jun 3, 2019
@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106123 has finished for PR 24784 at commit ea49eed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27938] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SPARK-27938][SQL] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default Jun 4, 2019
"turning the flag on provides a way for these sources to see these partitionBy columns.")
.booleanConf
.createWithDefault(false)
.createWithDefault(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why do we need this configuration if it's not invasive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was to avoid possible regression for the release 2.4.3, which was reasonable at that time.

Copy link
Member

@HyukjinKwon HyukjinKwon Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intrusive then should be no regression strictly though. Yea but I got that it was an extra caution. Can we remove it now in master?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we should be removing these in 3.0 rather than defaulting back to legacy behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will go ahead and remove this feature flag altogether if no objections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@liwensun liwensun changed the title [SPARK-27938][SQL] Turn on LEGACY_PASS_PARTITION_BY_AS_OPTIONS by default [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS Jun 6, 2019
@liwensun
Copy link
Contributor Author

liwensun commented Jun 6, 2019

I have updated the PR to remove the config. Thanks for the feedback!

@SparkQA
Copy link

SparkQA commented Jun 6, 2019

Test build #106256 has finished for PR 24784 at commit 2e18093.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented Jun 6, 2019

Test build #106258 has finished for PR 24784 at commit bbd8d14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.save()

val partColumns = LastOptions.parameters(DataSourceUtils.PARTITIONING_COLUMNS_KEY)
assert(DataSourceUtils.decodePartitioningColumns(partColumns) === Seq("col1", "col2"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodePartitioningColumns is under execution package that's not supposed to be exposed so users shouldn't use this util directly.

Did we document this option to any public datasource v1 API? We should also say this is a JSON string.

@HyukjinKwon
Copy link
Member

LGTM too. strictly #24784 (comment) can be done separately.

Merged to master.

emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
…TIONS

## What changes were proposed in this pull request?
In PR apache#24365, we pass in the partitionBy columns as options in `DataFrameWriter`.  To make this change less intrusive for a patch release, we added a feature flag `LEGACY_PASS_PARTITION_BY_AS_OPTIONS` with the default to be false.

For 3.0, we should just do the correct behavior for DSV1, i.e., always passing partitionBy as options, and remove this legacy feature flag.

## How was this patch tested?
Existing tests.

Closes apache#24784 from liwensun/SPARK-27453-default.

Authored-by: liwensun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants