Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 3, 2024

What changes were proposed in this pull request?

This PR aims to enforce to install six to Python 3.10 because Python 3.10 is missing six and causes Pandas detection failures in CIs.

$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.9 -m pip freeze | grep six
six==1.16.0
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.10 -m pip freeze
| grep six
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.11 -m pip freeze | grep six
six==1.16.0
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.12 -m pip freeze | grep six
six==1.16.0
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_classification (temp output: /__w/spark/spark/python/target/370eb2c4-12f2-411f-96d1-f617f5d59528/python3.10__pyspark.ml.tests.connect.test_connect_classification__v6itdsxy.log)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/__w/spark/spark/python/pyspark/ml/tests/connect/test_connect_classification.py", line 37, in <module>
    class ClassificationTestsOnConnect(ClassificationTestsMixin, unittest.TestCase):
NameError: name 'ClassificationTestsMixin' is not defined

Why are the changes needed?

Since Python 3.10 is the default Python version of Ubuntu OS, the behavior is different.

RUN python3.10 -m pip install numpy pyarrow>=15.0.0 six==1.16.0 ...
...
#20 0.766 Requirement already satisfied: six==1.16.0 in /usr/lib/python3/dist-packages (1.16.0)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Check the docker image built by this PR.

$ docker pull --platform amd64 ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657

$ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657 python3.10 -m pip freeze | grep six
six==1.16.0

Run tests on new docker image.

$ docker run -it --rm -v $PWD:/spark ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657
root@b7f5f56892b0:/# cd /spark
root@b7f5f56892b0:/spark# python/run-tests --modules=pyspark-mllib,pyspark-ml,pyspark-ml-connect --parallelism=1 --python-executables=python3.10
Running PySpark tests. Output is in /spark/python/unit-tests.log
Will test against the following Python executables: ['python3.10']
Will test the following Python modules: ['pyspark-mllib', 'pyspark-ml', 'pyspark-ml-connect']
python3.10 python_implementation is CPython
python3.10 version is: Python 3.10.12
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_classification (temp output: /spark/python/target/675eccdc-3c4b-4146-a58b-030302bdc6d7/python3.10__pyspark.ml.tests.connect.test_connect_classification__9habp0rh.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_classification (159s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation (temp output: /spark/python/target/fbac93ba-c72d-40e4-acfe-f3ac01b4932a/python3.10__pyspark.ml.tests.connect.test_connect_evaluation__js11z0ux.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation (36s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_feature (temp output: /spark/python/target/fdb8828e-4241-4e78-a7d6-b2a4beb3cfc1/python3.10__pyspark.ml.tests.connect.test_connect_feature__et5gr30f.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_feature (30s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_function (temp output: /spark/python/target/e365e62f-a09b-483d-9101-fe9dfc0801f2/python3.10__pyspark.ml.tests.connect.test_connect_function__5e288azs.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_function (24s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_pipeline (temp output: /spark/python/target/bdc167be-6d6e-4704-b840-cf5d23c4b21e/python3.10__pyspark.ml.tests.connect.test_connect_pipeline__63blw3o2.log)
...

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Apr 3, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as draft April 3, 2024 05:26
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-47452][INFRA][FOLLOWUP] Enforce to install six to Python 3.10 [SPARK-47452][INFRA][FOLLOWUP] Enforce to install six to Python 3.10 Apr 3, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review April 3, 2024 05:35
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 3, 2024

Could you review this PR, @HyukjinKwon ? I believe this PR will recover the Python 3.10 CI pipeline.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @xinrong-meng who were taking a look into this.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !
Let me merge this since the image building is already tested.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-47452-2 branch April 3, 2024 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants