-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version #17885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version #17885
Conversation
|
I'll target this for master, branch-2.2, branch-2.1. |
|
Test build #76535 has finished for PR 17885 at commit
|
| PYSPARK_VERSION=`echo "$SPARK_VERSION+$NAME" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"` | ||
| # Write out the VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT | ||
| # to dev0 to be closer to PEP440. | ||
| PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also affects the pyspark-*.tgz artifact name. It seems like this means the same file name will be used for different flavors of the release. If they're identical anyway it's just redundant, but are they? I don't know this part well so might be misunderstanding what this would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we currently only package Python for one Hadoop version. If we start doing multiple Hadoop versions for Python we can figure out how to handle that again.
|
If there are no other comments I'm going to merge this tomorrow. |
|
Are you referring to https://www.python.org/dev/peps/pep-0440/ ? |
|
Could you post the changes you made in the PR description and explain why it resolves PEP-0440? It might help more people understand the impacts of this PR by reading the PR description. Thanks! |
|
Updated with more explanation of what we changed in the PR description. |
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <[email protected]> Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string. (cherry picked from commit 1b85bcd) Signed-off-by: Holden Karau <[email protected]>
|
Merged to master, branch-2.2, and branch-2.1. |
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <[email protected]> Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string. (cherry picked from commit 1b85bcd) Signed-off-by: Holden Karau <[email protected]>
|
Could you post the original section about It sounds like PEP0440 does not encourage it. Below is what I found
|
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <[email protected]> Closes apache#17885 from holdenk/SPARK-20627-remove-pip-local-version-string.
What changes were proposed in this pull request?
Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.
How was this patch tested?
Ran
make-distributionlocally