Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Oct 13, 2021

@gengliangwang gengliangwang changed the title [DO NOT MERGE] Add 3.2.0 release note and news and update links Add 3.2.0 release note and news and update links Oct 15, 2021
@gengliangwang gengliangwang changed the title Add 3.2.0 release note and news and update links [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links Oct 15, 2021
Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to double checking the grammar in these JIRA names. Also, we need to move all the ANSI compliance work to the section "ANSI SQL Compatibility Enhancements", which is one of the most important directions in Spark SQL. So far, this section misses many work we are doing in the release.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 17, 2021

Do you think that is a blocker for Apache Foundation Software release, @gatorsmile ?

@gatorsmile
Copy link
Member

If we announce the GA of Spark 3.2 before publishing it to pypi, I think we need to highlight it in the release note and provide the link how to install it in different ways. https://spark.apache.org/docs/latest/api/python/getting_started/install.html

@gengliangwang Have we pushed it to Conda?

@gengliangwang gengliangwang changed the title [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links Add 3.2.0 release note and news and update links Oct 18, 2021
@gengliangwang
Copy link
Member Author

cc @gatorsmile @dongjoon-hyun @dbtsai @viirya @holdenk @sarutak @cloud-fan @HyukjinKwon @MaxGekk @Ngone51 @HeartSaVioR @zhengruifeng
This PR is ready for review. Please help review this one if you have time, thanks!


In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.

To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407). We have curated a list of high level changes here, grouped by major modules.
Copy link
Member

@viirya viirya Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consult JIRA or consult JIRAs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means the JIRA website. This is the same with the release note of 3.1.1 and 3.0.0.


<p>Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets.</p>

<p>In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.</p>
Copy link
Member

@sarutak sarutak Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we write pandas instead of Pandas?
I don't know whether the capitalized notation is officially recognized or not, but all the occurrence of pandas in the official web site is not capitalized.
https://pandas.pydata.org/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am following the official website.
cc @HyukjinKwon @zero323

@h-vetinari
Copy link

@gengliangwang Have we pushed it to Conda?

Hey all, I started a pull-request to package pyspark 3.2 for conda-forge (normally this would have been done sooner already, but the automated bot was waiting for the PyPI upload): conda-forge/pyspark-feedstock#31

It turns out that pyspark specifies py4j==0.10.9.2, whereas conda-forge currently only has py4j==0.10.9. Building both these packages should take a couple of hours, depending on how fast I can get people to merge the PRs.

After the packages have been built (and the packages become available through the content delivery network; takes about an hour upon CI completion), it would be possible to install pyspark through conda-forge as follows:

conda install -c conda-forge pyspark=3.2

Hope this helps.

@h-vetinari
Copy link

Little update, the conda-forge build of pyspark is now waiting to be merged by the feedstock team resp. conda-forge/core.

However, I noticed that the install instructions for conda are... not ideal. In particular, mixing pip & conda is strongly discouraged, because pip can trample on the conda-environment and break it.

Should I raise a PR under https://github.com/apache/spark/? Would be good if this could then be backported to 3.2 (presumably that's necessary for it to appear in the 3.2.0 docs)

@gengliangwang
Copy link
Member Author

@h-vetinari
Copy link

Update 2: pyspark 3.2.0 has been uploaded to https://anaconda.org/conda-forge/pyspark/files, will make its way through the CDN in about an hour.

@h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks!

Done here: apache/spark#34315 🙃

* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impact of this feature is more important than the other SQL and Core features, IMO. Can you adjust the order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it the second highlight?

* Avoid inlining non-deterministic With-CTEs ([SPARK-36447](https://issues.apache.org/jira/browse/SPARK-36447))
* Support analyzing all tables in a specific database ([SPARK-33687](https://issues.apache.org/jira/browse/SPARK-33687))
* Standardize exception messages in Spark ([SPARK-33539](https://issues.apache.org/jira/browse/SPARK-33539))
* Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE ([SPARK-30789](https://issues.apache.org/jira/browse/SPARK-30789))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to ANSI mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

### Known Issues

* Support fetching shuffle blocks in batch with i/o encryption ([SPARK-34827](https://issues.apache.org/jira/browse/SPARK-34827))
* Fail to load Snappy codec ([SPARK-36681](https://issues.apache.org/jira/browse/SPARK-36681))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention this only applies for sequence file? otherwise just by looking at the name it will appear like a very serious issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

@gengliangwang
Copy link
Member Author

Thanks all for the reviews. I will keep this open for a few more hours.

@gengliangwang
Copy link
Member Author

I am merging this one. Thanks for the great suggestions, everyone!

@gengliangwang gengliangwang merged commit 43ea0cc into apache:asf-site Oct 19, 2021
@limansky
Copy link

Hi, I've just found that there is no link for Hadoop 3.3 + Scala 2.13 build on download page.

@limansky
Copy link

BTW, why it's the only build available for Scala 2.13?

@srowen
Copy link
Member

srowen commented Oct 19, 2021

Fair point - I think the problem is the explosion of combinations of artifacts if there are sets for each scala version, but we did publish a binary release for 2.13 and should be in the UI. Unless someone's on that already I can hack in an option maybe. Probably anyone on Scala 2.13 is generally on newer versions of things, so not as much point in building for old Hadoop 2 and 2.13. (People can create whatever build they like from the source release though)

@limansky
Copy link

I think it would be nice to have both Hadoop 3 and build without Hadoop for 2.13.

HyukjinKwon pushed a commit to apache/spark that referenced this pull request Oct 22, 2021
### What changes were proposed in this pull request?

Improve conda installation docs

### Why are the changes needed?

As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)).
CC gengliangwang

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not tested

Closes #34315 from h-vetinari/conda-install.

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon pushed a commit to apache/spark that referenced this pull request Oct 22, 2021
### What changes were proposed in this pull request?

Improve conda installation docs

### Why are the changes needed?

As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)).
CC gengliangwang

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not tested

Closes #34315 from h-vetinari/conda-install.

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 016ab0c)
Signed-off-by: Hyukjin Kwon <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
### What changes were proposed in this pull request?

Improve conda installation docs

### Why are the changes needed?

As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)).
CC gengliangwang

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not tested

Closes apache#34315 from h-vetinari/conda-install.

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 016ab0c)
Signed-off-by: Hyukjin Kwon <[email protected]>
yaooqinn added a commit to apache/kyuubi that referenced this pull request Feb 22, 2022
<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
  2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
  1. If you add a feature, you can talk about the use case of it.
  2. If you fix a bug, you can clarify why it is a bug.
-->

The original idea of SPARK_HADOOP_VERSION is used to concat spark release names only, now we need to remove it as
- SPARK_HADOOP_VERSION is misunderstood by developers and misused somewhere like the one of kyuubi compiled
- multi-engine support now
- the release names  of spark(or something else) are very easy to get through code with different environments, prod/test/dev
- A `mvn` job is bundled with `bin/load-kyuubi-env.sh` which is truly worrisome
- SPARK_HADOOP_VERSION on spark side hass broken already for spark 3.2 which actually bundled with hadoop 3.3, see apache/spark-website#361 (comment)

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #1950 from yaooqinn/hadoop.

Closes #1950

b47be7c [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION
3b33ee5 [Kent Yao] Remove ambiguous SPARK_HADOOP_VERSION

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
### What changes were proposed in this pull request?

Improve conda installation docs

### Why are the changes needed?

As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)).
CC gengliangwang

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not tested

Closes apache#34315 from h-vetinari/conda-install.

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 016ab0c)
Signed-off-by: Hyukjin Kwon <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
### What changes were proposed in this pull request?

Improve conda installation docs

### Why are the changes needed?

As requested [here](apache/spark-website#361 (comment)). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)).
CC gengliangwang

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not tested

Closes apache#34315 from h-vetinari/conda-install.

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 016ab0c)
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.