Commit cda3654
[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork
## What changes were proposed in this pull request?
Currently, even if I explicitly disable Hive support in SparkR session as below:
```r
sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"),
enableHiveSupport = FALSE)
```
produces when the Hadoop version is not supported by our Hive fork:
```
java.lang.reflect.InvocationTargetException
...
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1.3.1.0.0-78
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
... 43 more
Error in handleErrors(returnStatus, conn) :
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:193)
at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116)
at org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52)
at org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
```
The root cause is that:
```
SparkSession.hiveClassesArePresent
```
check if the class is loadable or not to check if that's in classpath but `org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop version as static logic which is executed right away. This throws an `IllegalArgumentException` and that's not caught:
https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121
So, currently, if users have a Hive built-in Spark with unsupported Hadoop version by our fork (namely 3+), there's no way to use SparkR even though it could work.
This PR just propose to change the order of bool comparison so that we can don't execute `SparkSession.hiveClassesArePresent` when:
1. `enableHiveSupport` is explicitly disabled
2. `spark.sql.catalogImplementation` is `in-memory`
so that we **only** check `SparkSession.hiveClassesArePresent` when Hive support is explicitly enabled by short circuiting.
## How was this patch tested?
It's difficult to write a test since we don't run tests against Hadoop 3 yet. See apache#21588. Manually tested.
Closes apache#23356 from HyukjinKwon/SPARK-26422.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>1 parent a36eca0 commit cda3654
File tree
1 file changed
+10
-2
lines changed- sql/core/src/main/scala/org/apache/spark/sql/api/r
1 file changed
+10
-2
lines changedLines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
55 | 63 | | |
56 | 64 | | |
57 | 65 | | |
| |||
0 commit comments