Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 2, 2020

What changes were proposed in this pull request?

As of today,

  • SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository.
  • SPARK-32981 Apache Spark 3.1.0(master branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions.

This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0.

<hive.group>org.spark-project.hive</hive.group>
<hive.version>1.2.1.spark2</hive.version>

For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it.

Why are the changes needed?

  • First, Apache Spark community should not use the unofficial forked release of another Apache project.
  • Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive 1.2.1.spark2 exposed many unfixable bugs in Apache because the forked 1.2.1.spark2 is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with 1.2.1.spark2. Many bugs still exist in hive-1.2 profile and new Apache Spark unit tests are added with HiveUtils.isHive23 condition so far.

Does this PR introduce any user-facing change?

No. This is a dev-only change. PRBuilder will not accept [test-hive1.2] on master and branch-3.1.

How was this patch tested?

  1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366)
  2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382)
  3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.)
  4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected)

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129353 has started for PR 29936 at commit 032499e.

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33964/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33964/

@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129363 has finished for PR 29936 at commit 621582f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33972/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33972/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129362 has finished for PR 29936 at commit 032499e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33976/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33976/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129366 has finished for PR 29936 at commit 16b3452.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][BUILD][SQL] Remove Hive 1.2 [WIP][BUILD][SQL][test-hadoop2.7] Remove Hive 1.2 Oct 3, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please

@dongjoon-hyun dongjoon-hyun changed the title [WIP][BUILD][SQL][test-hadoop2.7] Remove Hive 1.2 [WIP][BUILD][SQL][test-hadoop2.7][test-hive1.2] Remove Hive 1.2 Oct 3, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Test build #129383 has finished for PR 29936 at commit 16b3452.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33990/

@dongjoon-hyun dongjoon-hyun changed the title [WIP][BUILD][SQL][test-hadoop2.7][test-hive1.2] Remove Hive 1.2 [WIP][BUILD][SQL][test-hadoop2.7] Remove Hive 1.2 Oct 3, 2020
@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33990/

@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33991/

@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33991/

@dongjoon-hyun dongjoon-hyun changed the title [WIP][BUILD][SQL][test-hadoop2.7] Remove Hive 1.2 [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive Oct 3, 2020
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) Oct 3, 2020
@SparkQA
Copy link

SparkQA commented Oct 3, 2020

Test build #129382 has finished for PR 29936 at commit 16b3452.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it if Hive 1.2 is gone.

@dongjoon-hyun
Copy link
Member Author

Thank you, @srowen .

@HyukjinKwon
Copy link
Member

@wangyum can you review this? I think it's important to get your review here.

@@ -296,8 +296,7 @@ private[hive] class HiveClientImpl(
case e: NoClassDefFoundError
if HiveUtils.isHive23 && e.getMessage.contains("org/apache/hadoop/hive/serde2/SerDe") =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun should we remove HiveUtils.isHive23 and related changes? I see it was added when we add Hive 2.3 support at 33f3c48#diff-842e3447fc453de26c706db1cac8f2c4R59. cc @wangyum FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu should we remove

override def isHive23OrSpark: Boolean = HiveUtils.isHive23
too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can remove HiveUtils.isHive23 and related changes.

@@ -28,14 +28,11 @@ import org.apache.spark.sql.hive.test.TestHive
* {{{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now remove this comment d7755cf#diff-10e604a9a9d9c4bcc9cdc01049851095R170

and enable this test d7755cf#diff-10e604a9a9d9c4bcc9cdc01049851095R201. Feel free to ignore back if it fails for whatever reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks we can just remove this comment too:

// Since Hive 1.2.1 library code path still has this problem, users may hit this
// when spark.sql.hive.convertMetastoreOrc=false. However, after SPARK-22279,
// Apache Spark with the default configuration doesn't hit this bug.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 5, 2020

Thank you so much for review, @HyukjinKwon and @wangyum . It's difficult to remove all conditional logics / comments / website in this PR. We have more instances in the test cases, too. I want to clean up them in another PR or a follow-up carefully. Is it okay for you guys? This PR focuses on removing org.spark-project.hive reference in mainly pom and source code structure.

@dongjoon-hyun
Copy link
Member Author

cc @dbtsai , @holdenk , @viirya , @sunchao

@dbtsai dbtsai self-requested a review October 5, 2020 20:57
Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 on cleaning up the rest of Hive1.2 code in followup PR as it's everywhere, and takes time to carefully remove them.

pom.xml Outdated
Comment on lines 3148 to 3149
<id>hive-1.2</id>
<properties>
<hive.group>org.spark-project.hive</hive.group>
<hive.classifier></hive.classifier>
<!-- Version used in Maven Hive dependency -->
<hive.version>1.2.1.spark2</hive.version>
<!-- Version used for internal directory structure -->
<hive.version.short>1.2</hive.version.short>
<hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
<hive.storage.version>2.6.0</hive.storage.version>
<hive.storage.scope>provided</hive.storage.scope>
<hive.common.scope>provided</hive.common.scope>
<hive.llap.scope>provided</hive.llap.scope>
<hive.serde.scope>provided</hive.serde.scope>
<hive.shims.scope>provided</hive.shims.scope>
<orc.classifier>nohive</orc.classifier>
<datanucleus-core.version>3.2.10</datanucleus-core.version>
</properties>
<!-- Exists only for backward compatibility. No-op. -->
Copy link
Member

@viirya viirya Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this profile is still useful after we remove all the hive-1.2 stuffs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's just that you get an error if you specify -Phive-1.2 if it's removed entirely. Now, maybe that's a good thing? previously for Scala profiles I had left them in because, for example -Pscala-2.12 was not the default before. When it became the default, that would have caused an error even though it was already set up for 2.12. But this case is probably different: someone selecting Hive 1.x support should see an error, probably? in which case this should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sounds more correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll remove this and raise error explicitly, @viirya and @srowen . :)

@viirya
Copy link
Member

viirya commented Oct 5, 2020

Looks OK and the tests/Github Actions were passed. It is okay to clean up other Hive 1.2 related code in next PRs.

@dongjoon-hyun
Copy link
Member Author

Thank you all. Merged to master for Apache Spark 3.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-REMOVE-HIVE1 branch October 5, 2020 22:30
@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34031/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34031/

@HyukjinKwon
Copy link
Member

Yeah, followup is fine. LGTM.

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129424 has finished for PR 29936 at commit c855260.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants