Skip to content

Commit 3657703

Browse files
YikunHyukjinKwon
authored andcommitted
[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5
### What changes were proposed in this pull request? Bump minimum pandas version to 1.0.5 (or a better version) ### Why are the changes needed? Initial discussion from [SPARK-37465](https://issues.apache.org/jira/browse/SPARK-37465) and #34314 (comment) . ### Does this PR introduce _any_ user-facing change? Yes, bump pandas minimun version. ### How was this patch tested? PySpark test passed with pandas v1.0.5. Closes #34717 from Yikun/pandas-min-version. Authored-by: Yikun Jiang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 49b5dd1 commit 3657703

File tree

6 files changed

+9
-6
lines changed

6 files changed

+9
-6
lines changed

python/docs/source/getting_started/install.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,11 +154,11 @@ Dependencies
154154
============= ========================= ======================================
155155
Package Minimum supported version Note
156156
============= ========================= ======================================
157-
`pandas` 0.23.2 Optional for Spark SQL
157+
`pandas` 1.0.5 Optional for Spark SQL
158158
`NumPy` 1.7 Required for MLlib DataFrame-based API
159159
`pyarrow` 1.0.0 Optional for Spark SQL
160160
`Py4J` 0.10.9.2 Required
161-
`pandas` 0.23.2 Required for pandas API on Spark
161+
`pandas` 1.0.5 Required for pandas API on Spark
162162
`pyarrow` 1.0.0 Required for pandas API on Spark
163163
`Numpy` 1.14 Required for pandas API on Spark
164164
============= ========================= ======================================

python/docs/source/migration_guide/pyspark_3.2_to_3.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,4 @@ Upgrading from PySpark 3.2 to 3.3
2222

2323
* In Spark 3.3, the ``pyspark.pandas.sql`` method follows [the standard Python string formatter](https://docs.python.org/3/library/string.html#format-string-syntax). To restore the previous behavior, set ``PYSPARK_PANDAS_SQL_LEGACY`` environment variable to ``1``.
2424
* In Spark 3.3, the ``drop`` method of pandas API on Spark DataFrame supports dropping rows by ``index``, and sets dropping by index instead of column by default.
25+
* In Spark 3.3, PySpark upgrades Pandas version, the new minimum required version changes from 0.23.2 to 1.0.5.

python/docs/source/user_guide/sql/arrow_pandas.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,7 @@ working with timestamps in ``pandas_udf``\s to get the best performance, see
387387
Recommended Pandas and PyArrow Versions
388388
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
389389

390-
For usage with pyspark.sql, the minimum supported versions of Pandas is 0.23.2 and PyArrow is 1.0.0.
390+
For usage with pyspark.sql, the minimum supported versions of Pandas is 1.0.5 and PyArrow is 1.0.0.
391391
Higher versions may be used, however, compatibility and data correctness can not be guaranteed and should
392392
be verified by the user.
393393

python/pyspark/pandas/tests/test_series.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2245,7 +2245,9 @@ def test_mad(self):
22452245
pser.index = pmidx
22462246
psser = ps.from_pandas(pser)
22472247

2248-
self.assert_eq(pser.mad(), psser.mad())
2248+
# Mark almost as True to avoid precision issue like:
2249+
# "21.555555555555554 != 21.555555555555557"
2250+
self.assert_eq(pser.mad(), psser.mad(), almost=True)
22492251

22502252
def test_to_frame(self):
22512253
pser = pd.Series(["a", "b", "c"])

python/pyspark/sql/pandas/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
def require_minimum_pandas_version() -> None:
2020
"""Raise ImportError if minimum version of Pandas is not installed"""
2121
# TODO(HyukjinKwon): Relocate and deduplicate the version specification.
22-
minimum_pandas_version = "0.23.2"
22+
minimum_pandas_version = "1.0.5"
2323

2424
from distutils.version import LooseVersion
2525

python/setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def _supports_symlinks():
111111
# For Arrow, you should also check ./pom.xml and ensure there are no breaking changes in the
112112
# binary format protocol with the Java version, see ARROW_HOME/format/* for specifications.
113113
# Also don't forget to update python/docs/source/getting_started/install.rst.
114-
_minimum_pandas_version = "0.23.2"
114+
_minimum_pandas_version = "1.0.5"
115115
_minimum_pyarrow_version = "1.0.0"
116116

117117

0 commit comments

Comments
 (0)