Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented May 7, 2017

What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.

How was this patch tested?

Ran make-distribution locally

@holdenk
Copy link
Contributor Author

holdenk commented May 7, 2017

I'll target this for master, branch-2.2, branch-2.1.

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76535 has finished for PR 17885 at commit 99414d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

PYSPARK_VERSION=`echo "$SPARK_VERSION+$NAME" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"`
# Write out the VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT
# to dev0 to be closer to PEP440.
PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also affects the pyspark-*.tgz artifact name. It seems like this means the same file name will be used for different flavors of the release. If they're identical anyway it's just redundant, but are they? I don't know this part well so might be misunderstanding what this would do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we currently only package Python for one Hadoop version. If we start doing multiple Hadoop versions for Python we can figure out how to handle that again.

@holdenk
Copy link
Contributor Author

holdenk commented May 9, 2017

If there are no other comments I'm going to merge this tomorrow.

@gatorsmile
Copy link
Member

Are you referring to https://www.python.org/dev/peps/pep-0440/ ?

@gatorsmile
Copy link
Member

Could you post the changes you made in the PR description and explain why it resolves PEP-0440? It might help more people understand the impacts of this PR by reading the PR description. Thanks!

@holdenk
Copy link
Contributor Author

holdenk commented May 9, 2017

Updated with more explanation of what we changed in the PR description.

asfgit pushed a commit that referenced this pull request May 9, 2017
…hon version

## What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.

## How was this patch tested?

Ran `make-distribution` locally

Author: Holden Karau <[email protected]>

Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string.

(cherry picked from commit 1b85bcd)
Signed-off-by: Holden Karau <[email protected]>
@holdenk
Copy link
Contributor Author

holdenk commented May 9, 2017

Merged to master, branch-2.2, and branch-2.1.

asfgit pushed a commit that referenced this pull request May 9, 2017
…hon version

## What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.

## How was this patch tested?

Ran `make-distribution` locally

Author: Holden Karau <[email protected]>

Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string.

(cherry picked from commit 1b85bcd)
Signed-off-by: Holden Karau <[email protected]>
@asfgit asfgit closed this in 1b85bcd May 9, 2017
@gatorsmile
Copy link
Member

Could you post the original section about local versions should not be used when publishing up-stream?

It sounds like PEP0440 does not encourage it. Below is what I found

The inclusion of the local version label makes it possible to differentiate upstream releases from potentially altered rebuilds by downstream integrators. The use of a local version identifier does not affect the kind of a release but, when applied to a source distribution, does indicate that it may not contain the exact same code as the corresponding upstream release.

liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
…hon version

## What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.

## How was this patch tested?

Ran `make-distribution` locally

Author: Holden Karau <[email protected]>

Closes apache#17885 from holdenk/SPARK-20627-remove-pip-local-version-string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants