Add 3.2.0 release note and news and update links #361

gengliangwang · 2021-10-13T14:02:48Z

This PR aims to update our website. I checked the followings are available as of now.

https://spark.apache.org/docs/3.2.0/
https://dist.apache.org/repos/dist/release/spark/spark-3.2.0/
https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.2.0/
https://pypi.org/project/pyspark/3.2.0/

releases/_posts/2021-10-13-spark-release-3-2-0.md

gatorsmile

We need to double checking the grammar in these JIRA names. Also, we need to move all the ANSI compliance work to the section "ANSI SQL Compatibility Enhancements", which is one of the most important directions in Spark SQL. So far, this section misses many work we are doing in the release.

dongjoon-hyun · 2021-10-17T02:38:04Z

Do you think that is a blocker for Apache Foundation Software release, @gatorsmile ?

gatorsmile · 2021-10-17T07:30:51Z

If we announce the GA of Spark 3.2 before publishing it to pypi, I think we need to highlight it in the release note and provide the link how to install it in different ways. https://spark.apache.org/docs/latest/api/python/getting_started/install.html

@gengliangwang Have we pushed it to Conda?

gengliangwang · 2021-10-18T05:51:54Z

cc @gatorsmile @dongjoon-hyun @dbtsai @viirya @holdenk @sarutak @cloud-fan @HyukjinKwon @MaxGekk @Ngone51 @HeartSaVioR @zhengruifeng
This PR is ready for review. Please help review this one if you have time, thanks!

viirya · 2021-10-18T06:06:07Z

releases/_posts/2021-10-13-spark-release-3-2-0.md

+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407). We have curated a list of high level changes here, grouped by major modules.


consult JIRA or consult JIRAs?

It means the JIRA website. This is the same with the release note of 3.1.1 and 3.0.0.

releases/_posts/2021-10-13-spark-release-3-2-0.md

sarutak · 2021-10-18T06:21:39Z

site/releases/spark-release-3-2-0.html

+
+<p>Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets.</p>
+
+<p>In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.</p>


Should we write pandas instead of Pandas?
I don't know whether the capitalized notation is officially recognized or not, but all the occurrence of pandas in the official web site is not capitalized.
https://pandas.pydata.org/

Yes, I am following the official website.
cc @HyukjinKwon @zero323

site/releases/spark-release-3-2-0.html

h-vetinari · 2021-10-18T08:10:38Z

@gengliangwang Have we pushed it to Conda?

Hey all, I started a pull-request to package pyspark 3.2 for conda-forge (normally this would have been done sooner already, but the automated bot was waiting for the PyPI upload): conda-forge/pyspark-feedstock#31

It turns out that pyspark specifies py4j==0.10.9.2, whereas conda-forge currently only has py4j==0.10.9. Building both these packages should take a couple of hours, depending on how fast I can get people to merge the PRs.

After the packages have been built (and the packages become available through the content delivery network; takes about an hour upon CI completion), it would be possible to install pyspark through conda-forge as follows:

conda install -c conda-forge pyspark=3.2

Hope this helps.

releases/_posts/2021-10-13-spark-release-3-2-0.md

h-vetinari · 2021-10-18T10:40:06Z

Little update, the conda-forge build of pyspark is now waiting to be merged by the feedstock team resp. conda-forge/core.

However, I noticed that the install instructions for conda are... not ideal. In particular, mixing pip & conda is strongly discouraged, because pip can trample on the conda-environment and break it.

Should I raise a PR under https://github.com/apache/spark/? Would be good if this could then be backported to 3.2 (presumably that's necessary for it to appear in the 3.2.0 docs)

gengliangwang · 2021-10-18T11:14:19Z

@h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks!

releases/_posts/2021-10-13-spark-release-3-2-0.md

h-vetinari · 2021-10-18T12:24:00Z

Update 2: pyspark 3.2.0 has been uploaded to https://anaconda.org/conda-forge/pyspark/files, will make its way through the CDN in about an hour.

@h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks!

Done here: apache/spark#34315 🙃

gatorsmile · 2021-10-18T15:41:33Z

releases/_posts/2021-10-13-spark-release-3-2-0.md

+* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))


The impact of this feature is more important than the other SQL and Core features, IMO. Can you adjust the order?

Make it the second highlight?

gatorsmile · 2021-10-18T15:50:48Z

releases/_posts/2021-10-13-spark-release-3-2-0.md

+* Avoid inlining non-deterministic With-CTEs ([SPARK-36447](https://issues.apache.org/jira/browse/SPARK-36447))
+* Support analyzing all tables in a specific database ([SPARK-33687](https://issues.apache.org/jira/browse/SPARK-33687))
+* Standardize exception messages in Spark ([SPARK-33539](https://issues.apache.org/jira/browse/SPARK-33539))
+* Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE ([SPARK-30789](https://issues.apache.org/jira/browse/SPARK-30789))


Please move it to ANSI mode.

Thanks, updated.

sunchao · 2021-10-18T16:06:39Z

releases/_posts/2021-10-13-spark-release-3-2-0.md

+### Known Issues
+
+* Support fetching shuffle blocks in batch with i/o encryption ([SPARK-34827](https://issues.apache.org/jira/browse/SPARK-34827))
+* Fail to load Snappy codec ([SPARK-36681](https://issues.apache.org/jira/browse/SPARK-36681))


Maybe mention this only applies for sequence file? otherwise just by looking at the name it will appear like a very serious issue.

Thanks, updated.

gengliangwang · 2021-10-19T06:18:34Z

Thanks all for the reviews. I will keep this open for a few more hours.

gengliangwang · 2021-10-19T14:08:14Z

I am merging this one. Thanks for the great suggestions, everyone!

limansky · 2021-10-19T15:13:25Z

Hi, I've just found that there is no link for Hadoop 3.3 + Scala 2.13 build on download page.

limansky · 2021-10-19T15:31:05Z

BTW, why it's the only build available for Scala 2.13?

srowen · 2021-10-19T15:35:42Z

Fair point - I think the problem is the explosion of combinations of artifacts if there are sets for each scala version, but we did publish a binary release for 2.13 and should be in the UI. Unless someone's on that already I can hack in an option maybe. Probably anyone on Scala 2.13 is generally on newer versions of things, so not as much point in building for old Hadoop 2 and 2.13. (People can create whatever build they like from the source release though)

limansky · 2021-10-19T15:47:35Z

I think it would be nice to have both Hadoop 3 and build without Hadoop for 2.13.

### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes #34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes #34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes apache#34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>

### _Why are the changes needed?_  The original idea of SPARK_HADOOP_VERSION is used to concat spark release names only, now we need to remove it as - SPARK_HADOOP_VERSION is misunderstood by developers and misused somewhere like the one of kyuubi compiled - multi-engine support now - the release names of spark(or something else) are very easy to get through code with different environments, prod/test/dev - A `mvn` job is bundled with `bin/load-kyuubi-env.sh` which is truly worrisome - SPARK_HADOOP_VERSION on spark side hass broken already for spark 3.2 which actually bundled with hadoop 3.3, see apache/spark-website#361 (comment) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #1950 from yaooqinn/hadoop. Closes #1950 b47be7c [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION 3b33ee5 [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>

### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes apache#34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 016ab0c) Signed-off-by: Hyukjin Kwon <[email protected]>

gengliangwang added 2 commits October 13, 2021 21:31

Add 3.2.0 release note/news

599e698

add py4j pinned thread issue

ed32703

gengliangwang changed the title ~~[DO NOT MERGE] Add 3.2.0 release note and news and update links~~ Add 3.2.0 release note and news and update links Oct 15, 2021

gengliangwang changed the title ~~Add 3.2.0 release note and news and update links~~ [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links Oct 15, 2021

gengliangwang added 2 commits October 15, 2021 20:16

remove duplicate line

a70a478

update spark-release-3-2-0.html

62e09c5

gatorsmile reviewed Oct 16, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

gatorsmile reviewed Oct 16, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

gatorsmile reviewed Oct 16, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

gatorsmile reviewed Oct 16, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

gatorsmile requested changes Oct 16, 2021

View reviewed changes

h-vetinari mentioned this pull request Oct 18, 2021

pyspark 3.2 conda-forge/pyspark-feedstock#31

Merged

gengliangwang added 3 commits October 18, 2021 12:52

address comments

dd8e8dd

revise wordings

eba122a

generate html

f53518c

gengliangwang changed the title ~~[BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links~~ Add 3.2.0 release note and news and update links Oct 18, 2021

viirya reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

viirya reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

viirya reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

viirya reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

sarutak reviewed Oct 18, 2021

View reviewed changes

site/releases/spark-release-3-2-0.html Outdated Show resolved Hide resolved

address review comments

5b1e3e9

cloud-fan reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

cloud-fan reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Outdated Show resolved Hide resolved

h-vetinari mentioned this pull request Oct 18, 2021

[SPARK-37050][PYTHON] Update Conda installation instructions apache/spark#34315

Closed

Ngone51 reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Show resolved Hide resolved

Ngone51 reviewed Oct 18, 2021

View reviewed changes

releases/_posts/2021-10-13-spark-release-3-2-0.md Show resolved Hide resolved

address more comments

b2c5918

Ngone51 approved these changes Oct 18, 2021

View reviewed changes

sarutak approved these changes Oct 18, 2021

View reviewed changes

gatorsmile reviewed Oct 18, 2021

View reviewed changes

sunchao reviewed Oct 18, 2021

View reviewed changes

address comments

e79d8c6

viirya approved these changes Oct 18, 2021

View reviewed changes

gatorsmile approved these changes Oct 19, 2021

View reviewed changes

MaxGekk approved these changes Oct 19, 2021

View reviewed changes

gengliangwang merged commit 43ea0cc into apache:asf-site Oct 19, 2021

yaooqinn mentioned this pull request Feb 21, 2022

Remove ambiguous SPARK_HADOOP_VERSION apache/kyuubi#1950

Closed

3 tasks


		In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.

		To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407). We have curated a list of high level changes here, grouped by major modules.


		<p>Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets.</p>

		<p>In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.</p>

Add 3.2.0 release note and news and update links #361

Add 3.2.0 release note and news and update links #361

Uh oh!

Conversation

gengliangwang commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Oct 17, 2021

Uh oh!

gengliangwang commented Oct 18, 2021

Uh oh!

viirya Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarutak Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

h-vetinari commented Oct 18, 2021

Uh oh!

Uh oh!

Uh oh!

h-vetinari commented Oct 18, 2021

Uh oh!

gengliangwang commented Oct 18, 2021

Uh oh!

Uh oh!

Uh oh!

h-vetinari commented Oct 18, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented Oct 19, 2021

Uh oh!

gengliangwang commented Oct 19, 2021

Uh oh!

limansky commented Oct 19, 2021

Uh oh!

limansky commented Oct 19, 2021

Uh oh!

srowen commented Oct 19, 2021

Uh oh!

limansky commented Oct 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

gengliangwang commented Oct 13, 2021 •

edited

Loading

dongjoon-hyun commented Oct 17, 2021 •

edited

Loading

viirya Oct 18, 2021 •

edited

Loading

sarutak Oct 18, 2021 •

edited

Loading