[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive` #21141

dongjoon-hyun · 2018-04-24T11:15:44Z

What changes were proposed in this pull request?

When PyArrow or Pandas are not available, the corresponding PySpark tests are skipped automatically. Currently, PySpark tests fail when we are not using -Phive. This PR aims to skip Hive related PySpark tests when -Phive is not given.

BEFORE

$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
File "/Users/dongjoon/spark/python/pyspark/sql/readwriter.py", line 295, in pyspark.sql.readwriter.DataFrameReader.table
...
IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"
**********************************************************************
   1 of   3 in pyspark.sql.readwriter.DataFrameReader.table
***Test Failed*** 1 failures.

AFTER

$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
...
Tests passed in 138 seconds

Skipped tests in pyspark.sql.tests with python2.7:
...
    test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive is not available.'

How was this patch tested?

This is a test-only change. First, this should pass the Jenkins. Then, manually do the following.

build/mvn -DskipTests clean package
python/run-tests.py --python-executables python2.7 --modules pyspark-sql

dongjoon-hyun · 2018-04-24T11:16:27Z

Hi, @holdenk .
Could you review this PR when you have some time?

SparkQA · 2018-04-24T11:37:58Z

Test build #89781 has finished for PR 21141 at commit dc81cf9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2018-04-24T11:51:44Z

Great, thank you! I'll review it this Thursday :)

…

On Tue, Apr 24, 2018 at 4:38 AM, UCB AMPLab ***@***.***> wrote: Merged build finished. Test FAILed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21141 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADp9Z2CL6LHic8YxOerc8VbU0ABhVmBks5trw7HgaJpZM4TheVG> .

HyukjinKwon · 2018-04-24T11:53:21Z

Actually, I think #20909 tries to fix the similar thing.

If both fix the similar things, this way looks a bit more preferable. @bersprockets, If that sounds same to you, we could maybe do the doctests skip thing in #20909 and get this merged in separately?

SparkQA · 2018-04-24T11:54:50Z

Test build #89782 has finished for PR 21141 at commit 54fdfd0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

bersprockets · 2018-04-26T00:15:23Z

python/pyspark/sql/tests.py

Will this result in the right kind of message, particularly the kind that @HyukjinKwon is checking for in PR #21107?

Yea, actually, I think it would be nicer in setUp for a better(?) message .. I replaced unittest.SkipTest to self.skipTest per https://docs.python.org/2/library/unittest.html#unittest.SkipTest and https://docs.python.org/2/library/unittest.html#unittest.TestCase.skipTest

I think we should see if @holdenk likes this or not too while we are here.

I do like more infromation on skipped tests.

Yep. According to the latest convention of #21107, it will be displayed like this.

Skipped tests in pyspark.sql.tests with python2.7: ... test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive is not available.'

holdenk · 2018-04-26T19:21:37Z

python/pyspark/sql/tests.py

Since this is test code its probably OK, but this assumes that someone hasn't packaged Spark as an assembly JAR I like the approach taken in #20909 for checking.

perhaps looks for TestHiveContext like in other test cases

spark/R/pkg/tests/fulltests/test_sparkSQL.R

Line 42 in 3f1e999

newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc, FALSE)

spark/python/pyspark/sql/context.py

Line 497 in 61487b3

jtestHive = sparkContext._jvm.org.apache.spark.sql.hive.test.TestHiveContext(jsc, False)

This PR will follow #20909 like the other occurrence in HiveContextSQLTests of this file.

holdenk · 2018-04-26T19:25:58Z

python/pyspark/sql/tests.py

I do like more infromation on skipped tests.

holdenk · 2018-04-26T19:54:04Z

Also weirdly when I run this locally without hive built I get some UDF registration exceptions (could be unrelated) - do you get that?

bersprockets · 2018-04-26T21:17:48Z

@holdenk Yes, see this jira. If you build with sbt, you need to also run 'build/sbt sql/test:compile' to get udfs from the test files.

dongjoon-hyun · 2018-04-27T06:31:18Z

Thank you for review, @HyukjinKwon , @holdenk , @bersprockets .

I didn't notice SPARK-23776 when I chose SPARK-23853 . I think we can merge those PRs now.

@bersprockets . Could you update your readwriter.py like this PR?

HyukjinKwon · 2018-04-27T07:20:54Z

I am okay either way but I thought @bersprockets agreed up on doing this separately here? Doctests stuff need more looks and I think this one alone can be merged separately.

@bersprockets I'll leave it to your preference - you are the author of the original PR.

bersprockets · 2018-04-27T15:42:16Z

@dongjoon-hyun @HyukjinKwon My PR is no longer addressing the issue described its associated Jira (SPARK-23776), which is that developers don't know what to do when they run the pyspark tests and get a failure with a UDF registration error (or Hive assembly missing error), as Holden experienced earlier. My PR morphed into a "skip the tests for missing components" change.

After these "skip tests" PRs go through, I will revisit this. In the meantime, feel free to use/ignore whatever is in #20909.

dongjoon-hyun · 2018-04-29T19:40:51Z

I see. Thanks, @bersprockets . I'll proceed this PR according to your and other peoples comments.

…`-Phive`

SparkQA · 2018-04-29T22:08:06Z

Test build #89974 has finished for PR 21141 at commit 271e152.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-04-30T00:14:51Z

The PR is updated now. Could you review this again, @holdenk, @HyukjinKwon , @felixcheung , @bersprockets ?

HyukjinKwon

LGTM

dongjoon-hyun · 2018-04-30T01:38:08Z

Thank you for review and approval, @HyukjinKwon .

bersprockets · 2018-04-30T20:14:58Z

My experience here is limited. Still, it also looks good to me.

dongjoon-hyun · 2018-04-30T20:32:12Z

Thank you, @bersprockets .

HyukjinKwon · 2018-05-01T01:06:37Z

Merged to master and branch-2.3.

…`-Phive` ## What changes were proposed in this pull request? When `PyArrow` or `Pandas` are not available, the corresponding PySpark tests are skipped automatically. Currently, PySpark tests fail when we are not using `-Phive`. This PR aims to skip Hive related PySpark tests when `-Phive` is not given. **BEFORE** ```bash $ build/mvn -DskipTests clean package $ python/run-tests.py --python-executables python2.7 --modules pyspark-sql File "/Users/dongjoon/spark/python/pyspark/sql/readwriter.py", line 295, in pyspark.sql.readwriter.DataFrameReader.table ... IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':" ********************************************************************** 1 of 3 in pyspark.sql.readwriter.DataFrameReader.table ***Test Failed*** 1 failures. ``` **AFTER** ```bash $ build/mvn -DskipTests clean package $ python/run-tests.py --python-executables python2.7 --modules pyspark-sql ... Tests passed in 138 seconds Skipped tests in pyspark.sql.tests with python2.7: ... test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive is not available.' ``` ## How was this patch tested? This is a test-only change. First, this should pass the Jenkins. Then, manually do the following. ```bash build/mvn -DskipTests clean package python/run-tests.py --python-executables python2.7 --modules pyspark-sql ``` Author: Dongjoon Hyun <[email protected]> Closes #21141 from dongjoon-hyun/SPARK-23853. (cherry picked from commit b857fb5) Signed-off-by: hyukjinkwon <[email protected]>

dongjoon-hyun · 2018-05-01T04:56:53Z

Thank you, @HyukjinKwon!

bersprockets mentioned this pull request Apr 24, 2018

[SPARK-23776][python][test] Check for needed components/files before running pyspark-sql tests #20909

Closed

bersprockets reviewed Apr 26, 2018

View reviewed changes

holdenk reviewed Apr 26, 2018

View reviewed changes

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for …

271e152

…`-Phive`

HyukjinKwon approved these changes Apr 30, 2018

View reviewed changes

asfgit closed this in b857fb5 May 1, 2018

dongjoon-hyun deleted the SPARK-23853 branch May 1, 2018 05:51

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for -Phive #21141

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for -Phive #21141

Uh oh!

Conversation

dongjoon-hyun commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Apr 24, 2018

Uh oh!

SparkQA commented Apr 24, 2018

Uh oh!

holdenk commented Apr 24, 2018 via email

Uh oh!

HyukjinKwon commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holdenk commented Apr 26, 2018

Uh oh!

bersprockets commented Apr 26, 2018

Uh oh!

dongjoon-hyun commented Apr 27, 2018

Uh oh!

HyukjinKwon commented Apr 27, 2018

Uh oh!

bersprockets commented Apr 27, 2018

Uh oh!

dongjoon-hyun commented Apr 29, 2018

Uh oh!

SparkQA commented Apr 29, 2018

Uh oh!

dongjoon-hyun commented Apr 30, 2018

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Apr 30, 2018

Uh oh!

bersprockets commented Apr 30, 2018

Uh oh!

dongjoon-hyun commented Apr 30, 2018

Uh oh!

HyukjinKwon commented May 1, 2018

Uh oh!

dongjoon-hyun commented May 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive` #21141

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive` #21141

dongjoon-hyun commented Apr 24, 2018 •

edited

Loading

HyukjinKwon commented Apr 24, 2018 •

edited

Loading