-
Notifications
You must be signed in to change notification settings - Fork 228
Add 3.2.0 release note and news and update links #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to double checking the grammar in these JIRA names. Also, we need to move all the ANSI compliance work to the section "ANSI SQL Compatibility Enhancements", which is one of the most important directions in Spark SQL. So far, this section misses many work we are doing in the release.
|
Do you think that is a blocker for Apache Foundation Software release, @gatorsmile ? |
|
If we announce the GA of Spark 3.2 before publishing it to pypi, I think we need to highlight it in the release note and provide the link how to install it in different ways. https://spark.apache.org/docs/latest/api/python/getting_started/install.html @gengliangwang Have we pushed it to Conda? |
|
cc @gatorsmile @dongjoon-hyun @dbtsai @viirya @holdenk @sarutak @cloud-fan @HyukjinKwon @MaxGekk @Ngone51 @HeartSaVioR @zhengruifeng |
|
|
||
| In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. | ||
|
|
||
| To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407). We have curated a list of high level changes here, grouped by major modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consult JIRA or consult JIRAs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the JIRA website. This is the same with the release note of 3.1.1 and 3.0.0.
|
|
||
| <p>Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets.</p> | ||
|
|
||
| <p>In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we write pandas instead of Pandas?
I don't know whether the capitalized notation is officially recognized or not, but all the occurrence of pandas in the official web site is not capitalized.
https://pandas.pydata.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am following the official website.
cc @HyukjinKwon @zero323
Hey all, I started a pull-request to package pyspark 3.2 for conda-forge (normally this would have been done sooner already, but the automated bot was waiting for the PyPI upload): conda-forge/pyspark-feedstock#31 It turns out that pyspark specifies After the packages have been built (and the packages become available through the content delivery network; takes about an hour upon CI completion), it would be possible to install pyspark through conda-forge as follows: Hope this helps. |
|
Little update, the conda-forge build of pyspark is now waiting to be merged by the feedstock team resp. conda-forge/core. However, I noticed that the install instructions for conda are... not ideal. In particular, mixing pip & conda is strongly discouraged, because pip can trample on the conda-environment and break it. Should I raise a PR under https://github.com/apache/spark/? Would be good if this could then be backported to 3.2 (presumably that's necessary for it to appear in the 3.2.0 docs) |
|
@h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks! |
|
Update 2: pyspark 3.2.0 has been uploaded to https://anaconda.org/conda-forge/pyspark/files, will make its way through the CDN in about an hour.
Done here: apache/spark#34315 🙃 |
| * EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) | ||
| * ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) | ||
| * Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) | ||
| * Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The impact of this feature is more important than the other SQL and Core features, IMO. Can you adjust the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it the second highlight?
| * Avoid inlining non-deterministic With-CTEs ([SPARK-36447](https://issues.apache.org/jira/browse/SPARK-36447)) | ||
| * Support analyzing all tables in a specific database ([SPARK-33687](https://issues.apache.org/jira/browse/SPARK-33687)) | ||
| * Standardize exception messages in Spark ([SPARK-33539](https://issues.apache.org/jira/browse/SPARK-33539)) | ||
| * Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE ([SPARK-30789](https://issues.apache.org/jira/browse/SPARK-30789)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it to ANSI mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
| ### Known Issues | ||
|
|
||
| * Support fetching shuffle blocks in batch with i/o encryption ([SPARK-34827](https://issues.apache.org/jira/browse/SPARK-34827)) | ||
| * Fail to load Snappy codec ([SPARK-36681](https://issues.apache.org/jira/browse/SPARK-36681)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention this only applies for sequence file? otherwise just by looking at the name it will appear like a very serious issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
|
Thanks all for the reviews. I will keep this open for a few more hours. |
|
I am merging this one. Thanks for the great suggestions, everyone! |
|
Hi, I've just found that there is no link for Hadoop 3.3 + Scala 2.13 build on download page. |
|
BTW, why it's the only build available for Scala 2.13? |
|
Fair point - I think the problem is the explosion of combinations of artifacts if there are sets for each scala version, but we did publish a binary release for 2.13 and should be in the UI. Unless someone's on that already I can hack in an option maybe. Probably anyone on Scala 2.13 is generally on newer versions of things, so not as much point in building for old Hadoop 2 and 2.13. (People can create whatever build they like from the source release though) |
|
I think it would be nice to have both Hadoop 3 and build without Hadoop for 2.13. |
### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes #34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes #34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes apache#34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> The original idea of SPARK_HADOOP_VERSION is used to concat spark release names only, now we need to remove it as - SPARK_HADOOP_VERSION is misunderstood by developers and misused somewhere like the one of kyuubi compiled - multi-engine support now - the release names of spark(or something else) are very easy to get through code with different environments, prod/test/dev - A `mvn` job is bundled with `bin/load-kyuubi-env.sh` which is truly worrisome - SPARK_HADOOP_VERSION on spark side hass broken already for spark 3.2 which actually bundled with hadoop 3.3, see apache/spark-website#361 (comment) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #1950 from yaooqinn/hadoop. Closes #1950 b47be7c [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION 3b33ee5 [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>
### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes apache#34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes apache#34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>
This PR aims to update our website. I checked the followings are available as of now.
https://spark.apache.org/docs/3.2.0/
https://dist.apache.org/repos/dist/release/spark/spark-3.2.0/
https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.2.0/
https://pypi.org/project/pyspark/3.2.0/