Skip to content

Commit 008a2ad

Browse files
committed
[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)
### What changes were proposed in this pull request? As of today, - SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository. - SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions. This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0. ``` <hive.group>org.spark-project.hive</hive.group> <hive.version>1.2.1.spark2</hive.version> ``` For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it. ### Why are the changes needed? - First, Apache Spark community should not use the unofficial forked release of another Apache project. - Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far. ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`. ### How was this patch tested? 1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366) 2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382) 3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.) 4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected) Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 14aeab3 commit 008a2ad

File tree

320 files changed

+7
-69240
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

320 files changed

+7
-69240
lines changed

dev/run-tests.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,6 @@ def get_hive_profiles(hive_version):
325325
"""
326326

327327
sbt_maven_hive_profiles = {
328-
"hive1.2": ["-Phive-1.2"],
329328
"hive2.3": ["-Phive-2.3"],
330329
}
331330

dev/test-dependencies.sh

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ export LC_ALL=C
3232
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive"
3333
MVN="build/mvn"
3434
HADOOP_HIVE_PROFILES=(
35-
hadoop-2.7-hive-1.2
3635
hadoop-2.7-hive-2.3
3736
hadoop-3.2-hive-2.3
3837
)
@@ -71,12 +70,9 @@ for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
7170
if [[ $HADOOP_HIVE_PROFILE == **hadoop-3.2-hive-2.3** ]]; then
7271
HADOOP_PROFILE=hadoop-3.2
7372
HIVE_PROFILE=hive-2.3
74-
elif [[ $HADOOP_HIVE_PROFILE == **hadoop-2.7-hive-2.3** ]]; then
75-
HADOOP_PROFILE=hadoop-2.7
76-
HIVE_PROFILE=hive-2.3
7773
else
7874
HADOOP_PROFILE=hadoop-2.7
79-
HIVE_PROFILE=hive-1.2
75+
HIVE_PROFILE=hive-2.3
8076
fi
8177
echo "Performing Maven install for $HADOOP_HIVE_PROFILE"
8278
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE jar:jar jar:test-jar install:install clean -q

docs/sql-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ license: |
4242

4343
- In Spark 3.1, incomplete interval literals, e.g. `INTERVAL '1'`, `INTERVAL '1 DAY 2'` will fail with IllegalArgumentException. In Spark 3.0, they result `NULL`s.
4444

45+
- In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your custom SerDes to Hive 2.3. See [HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details.
46+
4547
## Upgrading from Spark SQL 3.0 to 3.0.1
4648

4749
- In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer TimestampType from string values if they match to the pattern defined by the JSON option `timestampFormat`. Since version 3.0.1, the timestamp type inference is disabled by default. Set the JSON option `inferTimestamp` to `true` to enable such type inference.

pom.xml

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2970,13 +2970,9 @@
29702970
<sourceDirectories>
29712971
<directory>${basedir}/src/main/java</directory>
29722972
<directory>${basedir}/src/main/scala</directory>
2973-
<directory>${basedir}/v${hive.version.short}/src/main/java</directory>
2974-
<directory>${basedir}/v${hive.version.short}/src/main/scala</directory>
29752973
</sourceDirectories>
29762974
<testSourceDirectories>
29772975
<directory>${basedir}/src/test/java</directory>
2978-
<directory>${basedir}/v${hive.version.short}/src/test/java</directory>
2979-
<directory>${basedir}/v${hive.version.short}/src/test/scala</directory>
29802976
</testSourceDirectories>
29812977
<configLocation>dev/checkstyle.xml</configLocation>
29822978
<outputFile>${basedir}/target/checkstyle-output.xml</outputFile>
@@ -3148,27 +3144,6 @@
31483144
<!-- Default hadoop profile. Uses global properties. -->
31493145
</profile>
31503146

3151-
<profile>
3152-
<id>hive-1.2</id>
3153-
<properties>
3154-
<hive.group>org.spark-project.hive</hive.group>
3155-
<hive.classifier></hive.classifier>
3156-
<!-- Version used in Maven Hive dependency -->
3157-
<hive.version>1.2.1.spark2</hive.version>
3158-
<!-- Version used for internal directory structure -->
3159-
<hive.version.short>1.2</hive.version.short>
3160-
<hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
3161-
<hive.storage.version>2.6.0</hive.storage.version>
3162-
<hive.storage.scope>provided</hive.storage.scope>
3163-
<hive.common.scope>provided</hive.common.scope>
3164-
<hive.llap.scope>provided</hive.llap.scope>
3165-
<hive.serde.scope>provided</hive.serde.scope>
3166-
<hive.shims.scope>provided</hive.shims.scope>
3167-
<orc.classifier>nohive</orc.classifier>
3168-
<datanucleus-core.version>3.2.10</datanucleus-core.version>
3169-
</properties>
3170-
</profile>
3171-
31723147
<profile>
31733148
<id>hive-2.3</id>
31743149
<!-- Default hive profile. Uses global properties. -->

sql/core/pom.xml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -221,8 +221,6 @@
221221
</goals>
222222
<configuration>
223223
<sources>
224-
<source>v${hive.version.short}/src/main/scala</source>
225-
<source>v${hive.version.short}/src/main/java</source>
226224
<source>src/main/scala-${scala.binary.version}</source>
227225
</sources>
228226
</configuration>
@@ -235,7 +233,6 @@
235233
</goals>
236234
<configuration>
237235
<sources>
238-
<source>v${hive.version.short}/src/test/scala</source>
239236
<source>src/test/gen-java</source>
240237
</sources>
241238
</configuration>

sql/core/v1.2/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java

Lines changed: 0 additions & 208 deletions
This file was deleted.

0 commit comments

Comments
 (0)