-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for -Phive
#21141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @holdenk . |
|
Test build #89781 has finished for PR 21141 at commit
|
|
Great, thank you! I'll review it this Thursday :)
…On Tue, Apr 24, 2018 at 4:38 AM, UCB AMPLab ***@***.***> wrote:
Merged build finished. Test FAILed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21141 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADp9Z2CL6LHic8YxOerc8VbU0ABhVmBks5trw7HgaJpZM4TheVG>
.
|
|
Actually, I think #20909 tries to fix the similar thing. If both fix the similar things, this way looks a bit more preferable. @bersprockets, If that sounds same to you, we could maybe do the doctests skip thing in #20909 and get this merged in separately? |
|
Test build #89782 has finished for PR 21141 at commit
|
python/pyspark/sql/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this result in the right kind of message, particularly the kind that @HyukjinKwon is checking for in PR #21107?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, actually, I think it would be nicer in setUp for a better(?) message .. I replaced unittest.SkipTest to self.skipTest per https://docs.python.org/2/library/unittest.html#unittest.SkipTest and https://docs.python.org/2/library/unittest.html#unittest.TestCase.skipTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should see if @holdenk likes this or not too while we are here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like more infromation on skipped tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. According to the latest convention of #21107, it will be displayed like this.
Skipped tests in pyspark.sql.tests with python2.7:
...
test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive is not available.'
python/pyspark/sql/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is test code its probably OK, but this assumes that someone hasn't packaged Spark as an assembly JAR I like the approach taken in #20909 for checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps looks for TestHiveContext like in other test cases
| newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc, FALSE) |
spark/python/pyspark/sql/context.py
Line 497 in 61487b3
| jtestHive = sparkContext._jvm.org.apache.spark.sql.hive.test.TestHiveContext(jsc, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will follow #20909 like the other occurrence in HiveContextSQLTests of this file.
python/pyspark/sql/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like more infromation on skipped tests.
|
Also weirdly when I run this locally without hive built I get some UDF registration exceptions (could be unrelated) - do you get that? |
|
Thank you for review, @HyukjinKwon , @holdenk , @bersprockets . I didn't notice SPARK-23776 when I chose SPARK-23853 . I think we can merge those PRs now. @bersprockets . Could you update your |
|
I am okay either way but I thought @bersprockets agreed up on doing this separately here? Doctests stuff need more looks and I think this one alone can be merged separately. @bersprockets I'll leave it to your preference - you are the author of the original PR. |
|
@dongjoon-hyun @HyukjinKwon My PR is no longer addressing the issue described its associated Jira (SPARK-23776), which is that developers don't know what to do when they run the pyspark tests and get a failure with a UDF registration error (or Hive assembly missing error), as Holden experienced earlier. My PR morphed into a "skip the tests for missing components" change. After these "skip tests" PRs go through, I will revisit this. In the meantime, feel free to use/ignore whatever is in #20909. |
|
I see. Thanks, @bersprockets . I'll proceed this PR according to your and other peoples comments. |
|
Test build #89974 has finished for PR 21141 at commit
|
|
The PR is updated now. Could you review this again, @holdenk, @HyukjinKwon , @felixcheung , @bersprockets ? |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thank you for review and approval, @HyukjinKwon . |
|
My experience here is limited. Still, it also looks good to me. |
|
Thank you, @bersprockets . |
|
Merged to master and branch-2.3. |
…`-Phive`
## What changes were proposed in this pull request?
When `PyArrow` or `Pandas` are not available, the corresponding PySpark tests are skipped automatically. Currently, PySpark tests fail when we are not using `-Phive`. This PR aims to skip Hive related PySpark tests when `-Phive` is not given.
**BEFORE**
```bash
$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
File "/Users/dongjoon/spark/python/pyspark/sql/readwriter.py", line 295, in pyspark.sql.readwriter.DataFrameReader.table
...
IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"
**********************************************************************
1 of 3 in pyspark.sql.readwriter.DataFrameReader.table
***Test Failed*** 1 failures.
```
**AFTER**
```bash
$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
...
Tests passed in 138 seconds
Skipped tests in pyspark.sql.tests with python2.7:
...
test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive is not available.'
```
## How was this patch tested?
This is a test-only change. First, this should pass the Jenkins. Then, manually do the following.
```bash
build/mvn -DskipTests clean package
python/run-tests.py --python-executables python2.7 --modules pyspark-sql
```
Author: Dongjoon Hyun <[email protected]>
Closes #21141 from dongjoon-hyun/SPARK-23853.
(cherry picked from commit b857fb5)
Signed-off-by: hyukjinkwon <[email protected]>
|
Thank you, @HyukjinKwon! |
What changes were proposed in this pull request?
When
PyArroworPandasare not available, the corresponding PySpark tests are skipped automatically. Currently, PySpark tests fail when we are not using-Phive. This PR aims to skip Hive related PySpark tests when-Phiveis not given.BEFORE
AFTER
How was this patch tested?
This is a test-only change. First, this should pass the Jenkins. Then, manually do the following.