Skip to content

Conversation

@liwensun
Copy link
Contributor

What changes were proposed in this pull request?

Pass partitionBy columns as options and feature-flag this behavior.

How was this patch tested?

A new unit test.

@marmbrus
Copy link
Contributor

Add to whitelist

@zsxwing
Copy link
Member

zsxwing commented Apr 13, 2019

test this please

@SparkQA
Copy link

SparkQA commented Apr 14, 2019

Test build #104568 has finished for PR 24365 at commit 61a4c96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.internal()
.doc("Whether to pass the partitionBy columns as options in DataFrameWriter." +
" Data source V1 now silently drops partitionBy columns for non-file-format sources;" +
" turning the flag on provides a way for these sources to see these partitionBy columns.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the format in this file should be "text " + "more text" instead of "text" + " more text"

" Data source V1 now silently drops partitionBy columns for non-file-format sources;" +
" turning the flag on provides a way for these sources to see these partitionBy columns.")
.booleanConf
.createWithDefault(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the default be false to be backward compatible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on what to recommend here. I think we need to balance the fact that this is a behavior change with the confusion that a user will experience when options they specify are silently dropped.

I recommended going with true because: a) I don't know any V1 sources that validate options where this change would break an existing program and b) the only time behavior changes is when someone is specifying a (silently dropped) partitionBy clause.

Thoughts?

@liwensun
Copy link
Contributor Author

test this please

@liwensun
Copy link
Contributor Author

jenkins retest this please

@liwensun
Copy link
Contributor Author

retest this please

@zsxwing
Copy link
Member

zsxwing commented Apr 16, 2019

Add to whitelist

@zsxwing
Copy link
Member

zsxwing commented Apr 16, 2019

Jenkins, add to whitelist


val LEGACY_PASS_PARTITION_BY_AS_OPTIONS =
buildConf("spark.sql.legacy.sources.write.passPartitionByAsOptions")
.internal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this must be internal?

I see some of the previous ones with LEGACY_ prefix are internal but before them there are a few externals: LEGACY_REPLACE_DATABRICKS_SPARK_AVRO_ENABLED and LEGACY_SIZE_OF_NULL. So what is the reason defining this as internal?

I assume if we expect the users to set it it then it must be external.
What is your (and the others) opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to document this flag for users. This is just in case we break some existing production workloads. For other users, they don't need to know anything about this flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Thanks for the explanation.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104633 has finished for PR 24365 at commit c8cdd01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Apr 16, 2019

LGTM. Merging to master and 2.4

@asfgit asfgit closed this in 26ed65f Apr 16, 2019
asfgit pushed a commit that referenced this pull request Apr 16, 2019
Pass partitionBy columns as options and feature-flag this behavior.

A new unit test.

Closes #24365 from liwensun/partitionby.

Authored-by: liwensun <[email protected]>
Signed-off-by: Tathagata Das <[email protected]>
(cherry picked from commit 26ed65f)
Signed-off-by: Tathagata Das <[email protected]>
mccheah pushed a commit to palantir/spark that referenced this pull request May 24, 2019
## What changes were proposed in this pull request?

Pass partitionBy columns as options and feature-flag this behavior.

## How was this patch tested?

A new unit test.

Closes apache#24365 from liwensun/partitionby.

Authored-by: liwensun <[email protected]>
Signed-off-by: Tathagata Das <[email protected]>
emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
…TIONS

## What changes were proposed in this pull request?
In PR apache#24365, we pass in the partitionBy columns as options in `DataFrameWriter`.  To make this change less intrusive for a patch release, we added a feature flag `LEGACY_PASS_PARTITION_BY_AS_OPTIONS` with the default to be false.

For 3.0, we should just do the correct behavior for DSV1, i.e., always passing partitionBy as options, and remove this legacy feature flag.

## How was this patch tested?
Existing tests.

Closes apache#24784 from liwensun/SPARK-27453-default.

Authored-by: liwensun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
Pass partitionBy columns as options and feature-flag this behavior.

A new unit test.

Closes apache#24365 from liwensun/partitionby.

Authored-by: liwensun <[email protected]>
Signed-off-by: Tathagata Das <[email protected]>
(cherry picked from commit 26ed65f)
Signed-off-by: Tathagata Das <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
Pass partitionBy columns as options and feature-flag this behavior.

A new unit test.

Closes apache#24365 from liwensun/partitionby.

Authored-by: liwensun <[email protected]>
Signed-off-by: Tathagata Das <[email protected]>
(cherry picked from commit 26ed65f)
Signed-off-by: Tathagata Das <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
Pass partitionBy columns as options and feature-flag this behavior.

A new unit test.

Closes apache#24365 from liwensun/partitionby.

Authored-by: liwensun <[email protected]>
Signed-off-by: Tathagata Das <[email protected]>
(cherry picked from commit 26ed65f)
Signed-off-by: Tathagata Das <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants