[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203

HyukjinKwon · 2018-12-03T08:17:20Z

What changes were proposed in this pull request?

This PR proposes add a developer option, --testnames, to our testing script to allow run specific set of unittests and doctests.

1. Run unittests in the class

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'

Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests']
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s)
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests were skipped
Tests passed in 14 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_fallback_enabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
...

2. Run single unittest in the class.

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'

Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (0s) ... 1 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (8s)
Tests passed in 8 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion with pypy:
    test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'

3. Run doctests in single PySpark module.

./run-tests --testnames pyspark.sql.dataframe

Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.dataframe
Finished test(python2.7): pyspark.sql.dataframe (47s)
Finished test(pypy): pyspark.sql.dataframe (48s)
Tests passed in 48 seconds

Of course, you can mix them:

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe'

Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests', 'pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Starting test(python2.7): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s)
Finished test(python2.7): pyspark.sql.dataframe (50s)
Finished test(pypy): pyspark.sql.dataframe (52s)
Tests passed in 52 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'

and also you can use all other options (except --modules, which will be ignored)

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=python

Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
Starting test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Finished test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (12s)
Tests passed in 12 seconds

See help below:

 ./run-tests --help

Usage: run-tests [options]

Options:
...
  Developer Options:
    --testnames=TESTNAMES
                        A comma-separated list of specific modules, classes
                        and functions of doctest or unittest to test. For
                        example, 'pyspark.sql.foo' to run the module as
                        unittests or doctests, 'pyspark.sql.tests FooTests' to
                        run the specific class of unittests,
                        'pyspark.sql.tests FooTests.test_foo' to run the
                        specific unittest in the class. '--modules' option is
                        ignored if they are given.

I intentionally grouped it as a developer option to be more conservative.

How was this patch tested?

Manually tested. Negative tests were also done.

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python

...
AttributeError: type object 'ArrowTests' has no attribute 'test_null_conversion1'
...

./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python

...
AttributeError: 'module' object has no attribute 'ArrowT'
...

 ./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python

...
/.../python2.7: No module named pyspark.sql.tests.test_ar

…ests script

HyukjinKwon · 2018-12-03T08:18:03Z

cc @cloud-fan, @dongjoon-hyun, @icexelloss, @BryanCutler, @viirya (who I talked about this before).

HyukjinKwon · 2018-12-03T08:19:15Z

python/run-tests-with-coverage


-# If you'd like to run a specific unittest class, you could do such as
-# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
 ./run-tests "$@"


BTW, it works with coverage script as well. manually tested.

cloud-fan · 2018-12-03T10:34:13Z

I used to run pyspark test via python python/pyspark/sql/dataframe.py, after setting export PYTHONPATH="$(find "${SPARK_HOME}"/python/lib/ -name 'py4j-*-src.zip' -type f | uniq)":"${SPARK_HOME}"/python.

I'm happy to see an easier way to do it, though I'm not very familiar with these scrpts. Thanks for doing it!

viirya · 2018-12-03T11:29:08Z

Not look closely at the changes yet, but I think it should be very useful. Thanks @HyukjinKwon

SparkQA · 2018-12-03T11:35:34Z

Test build #99599 has finished for PR 23203 at commit 44c622b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

The idea is fine by me; don't know the python scripts that well but it seems reasonable. Also add a note about this at https://spark.apache.org/developer-tools.html after it's merged.

HyukjinKwon · 2018-12-04T08:21:43Z

Yea, will update it as well after this one gets merged.

BryanCutler

Running individual tests is a question that comes up a lot, and this will make it much easier, thanks for doing this @HyukjinKwon! I ran some local tests with this and works great. I just had one minor suggestion, otherwise LGTM.

python/run-tests.py

SparkQA · 2018-12-05T03:18:38Z

Test build #99697 has finished for PR 23203 at commit bd23e01.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-12-05T07:21:40Z

Merged to master.

HyukjinKwon · 2018-12-05T07:22:13Z

Thank you @cloud-fan, @viirya, @srowen, and @BryanCutler.

This PR adds some guides for testing individual PySpark tests, and also some information about PySpark coverage. ![screen shot 2018-12-05 at 3 59 50 pm](https://user-images.githubusercontent.com/6477701/49498733-014f4800-f8a7-11e8-8b51-b96e25fbc1bc.png) See also apache/spark#23203 and SPARK-26252 Closes #161

…ctests in python/run-tests script ## What changes were proposed in this pull request? This PR proposes add a developer option, `--testnames`, to our testing script to allow run specific set of unittests and doctests. **1. Run unittests in the class** ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests'] Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s) Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests were skipped Tests passed in 14 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy: test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_enabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped ... ``` **2. Run single unittest in the class.** ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'] Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (0s) ... 1 tests were skipped Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (8s) Tests passed in 8 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion with pypy: test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' ``` **3. Run doctests in single PySpark module.** ```bash ./run-tests --testnames pyspark.sql.dataframe ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.dataframe'] Starting test(pypy): pyspark.sql.dataframe Starting test(python2.7): pyspark.sql.dataframe Finished test(python2.7): pyspark.sql.dataframe (47s) Finished test(pypy): pyspark.sql.dataframe (48s) Tests passed in 48 seconds ``` Of course, you can mix them: ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests', 'pyspark.sql.dataframe'] Starting test(pypy): pyspark.sql.dataframe Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests Starting test(python2.7): pyspark.sql.dataframe Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests were skipped Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s) Finished test(python2.7): pyspark.sql.dataframe (50s) Finished test(pypy): pyspark.sql.dataframe (52s) Tests passed in 52 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy: test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' ``` and also you can use all other options (except `--modules`, which will be ignored) ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=python ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'] Starting test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Finished test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (12s) Tests passed in 12 seconds ``` See help below: ```bash ./run-tests --help ``` ``` Usage: run-tests [options] Options: ... Developer Options: --testnames=TESTNAMES A comma-separated list of specific modules, classes and functions of doctest or unittest to test. For example, 'pyspark.sql.foo' to run the module as unittests or doctests, 'pyspark.sql.tests FooTests' to run the specific class of unittests, 'pyspark.sql.tests FooTests.test_foo' to run the specific unittest in the class. '--modules' option is ignored if they are given. ``` I intentionally grouped it as a developer option to be more conservative. ## How was this patch tested? Manually tested. Negative tests were also done. ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python ``` ``` ... AttributeError: type object 'ArrowTests' has no attribute 'test_null_conversion1' ... ``` ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python ``` ``` ... AttributeError: 'module' object has no attribute 'ArrowT' ... ``` ```bash ./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python ``` ``` ... /.../python2.7: No module named pyspark.sql.tests.test_ar ``` Closes apache#23203 from HyukjinKwon/SPARK-26252. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

Add support to run specific unittests and/or doctests in python/run-t…

44c622b

…ests script

HyukjinKwon commented Dec 3, 2018

View reviewed changes

srowen approved these changes Dec 3, 2018

View reviewed changes

BryanCutler reviewed Dec 4, 2018

View reviewed changes

python/run-tests.py Show resolved Hide resolved

HyukjinKwon added 3 commits December 5, 2018 10:40

Address comments

8d3a5c5

Reduce diff

643cafc

Newline

bd23e01

asfgit closed this in 7e3eb3c Dec 5, 2018

HyukjinKwon mentioned this pull request Dec 5, 2018

Add some guides for testing individual PySpark tests at Developer Tools apache/spark-website#161

Closed

HyukjinKwon deleted the SPARK-26252 branch March 3, 2020 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203

[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203

Uh oh!

HyukjinKwon commented Dec 3, 2018 •

edited

Loading

Uh oh!

HyukjinKwon commented Dec 3, 2018

Uh oh!

HyukjinKwon Dec 3, 2018

Uh oh!

cloud-fan commented Dec 3, 2018

Uh oh!

viirya commented Dec 3, 2018

Uh oh!

SparkQA commented Dec 3, 2018

Uh oh!

srowen left a comment

Uh oh!

HyukjinKwon commented Dec 4, 2018

Uh oh!

BryanCutler left a comment

Uh oh!

Uh oh!

SparkQA commented Dec 5, 2018

Uh oh!

HyukjinKwon commented Dec 5, 2018

Uh oh!

HyukjinKwon commented Dec 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203

[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203

Uh oh!

Conversation

HyukjinKwon commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Dec 3, 2018

Uh oh!

HyukjinKwon Dec 3, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 3, 2018

Uh oh!

viirya commented Dec 3, 2018

Uh oh!

SparkQA commented Dec 3, 2018

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Dec 4, 2018

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Dec 5, 2018

Uh oh!

HyukjinKwon commented Dec 5, 2018

Uh oh!

HyukjinKwon commented Dec 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HyukjinKwon commented Dec 3, 2018 •

edited

Loading