-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script #23203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan, @dongjoon-hyun, @icexelloss, @BryanCutler, @viirya (who I talked about this before). |
|
|
||
| # If you'd like to run a specific unittest class, you could do such as | ||
| # SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests | ||
| ./run-tests "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, it works with coverage script as well. manually tested.
|
I used to run pyspark test via I'm happy to see an easier way to do it, though I'm not very familiar with these scrpts. Thanks for doing it! |
|
Not look closely at the changes yet, but I think it should be very useful. Thanks @HyukjinKwon |
|
Test build #99599 has finished for PR 23203 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is fine by me; don't know the python scripts that well but it seems reasonable. Also add a note about this at https://spark.apache.org/developer-tools.html after it's merged.
|
Yea, will update it as well after this one gets merged. |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running individual tests is a question that comes up a lot, and this will make it much easier, thanks for doing this @HyukjinKwon! I ran some local tests with this and works great. I just had one minor suggestion, otherwise LGTM.
|
Test build #99697 has finished for PR 23203 at commit
|
|
Merged to master. |
|
Thank you @cloud-fan, @viirya, @srowen, and @BryanCutler. |
This PR adds some guides for testing individual PySpark tests, and also some information about PySpark coverage.  See also apache/spark#23203 and SPARK-26252 Closes #161
…ctests in python/run-tests script
## What changes were proposed in this pull request?
This PR proposes add a developer option, `--testnames`, to our testing script to allow run specific set of unittests and doctests.
**1. Run unittests in the class**
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests']
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s)
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests were skipped
Tests passed in 14 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_fallback_enabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
...
```
**2. Run single unittest in the class.**
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (0s) ... 1 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (8s)
Tests passed in 8 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion with pypy:
test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
```
**3. Run doctests in single PySpark module.**
```bash
./run-tests --testnames pyspark.sql.dataframe
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.dataframe
Finished test(python2.7): pyspark.sql.dataframe (47s)
Finished test(pypy): pyspark.sql.dataframe (48s)
Tests passed in 48 seconds
```
Of course, you can mix them:
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests', 'pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Starting test(python2.7): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s)
Finished test(python2.7): pyspark.sql.dataframe (50s)
Finished test(pypy): pyspark.sql.dataframe (52s)
Tests passed in 52 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
```
and also you can use all other options (except `--modules`, which will be ignored)
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=python
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
Starting test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
Finished test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (12s)
Tests passed in 12 seconds
```
See help below:
```bash
./run-tests --help
```
```
Usage: run-tests [options]
Options:
...
Developer Options:
--testnames=TESTNAMES
A comma-separated list of specific modules, classes
and functions of doctest or unittest to test. For
example, 'pyspark.sql.foo' to run the module as
unittests or doctests, 'pyspark.sql.tests FooTests' to
run the specific class of unittests,
'pyspark.sql.tests FooTests.test_foo' to run the
specific unittest in the class. '--modules' option is
ignored if they are given.
```
I intentionally grouped it as a developer option to be more conservative.
## How was this patch tested?
Manually tested. Negative tests were also done.
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python
```
```
...
AttributeError: type object 'ArrowTests' has no attribute 'test_null_conversion1'
...
```
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python
```
```
...
AttributeError: 'module' object has no attribute 'ArrowT'
...
```
```bash
./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python
```
```
...
/.../python2.7: No module named pyspark.sql.tests.test_ar
```
Closes apache#23203 from HyukjinKwon/SPARK-26252.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes add a developer option,
--testnames, to our testing script to allow run specific set of unittests and doctests.1. Run unittests in the class
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'2. Run single unittest in the class.
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'3. Run doctests in single PySpark module.
Of course, you can mix them:
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe'and also you can use all other options (except
--modules, which will be ignored)./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=pythonSee help below:
I intentionally grouped it as a developer option to be more conservative.
How was this patch tested?
Manually tested. Negative tests were also done.
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python