-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) #29936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #129353 has started for PR 29936 at commit |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Retest this please |
|
Test build #129363 has finished for PR 29936 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129362 has finished for PR 29936 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129366 has finished for PR 29936 at commit
|
|
Retest this please |
|
Retest this please. |
|
Test build #129383 has finished for PR 29936 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129382 has finished for PR 29936 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with it if Hive 1.2 is gone.
|
Thank you, @srowen . |
|
@wangyum can you review this? I think it's important to get your review here. |
| @@ -296,8 +296,7 @@ private[hive] class HiveClientImpl( | |||
| case e: NoClassDefFoundError | |||
| if HiveUtils.isHive23 && e.getMessage.contains("org/apache/hadoop/hive/serde2/SerDe") => | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun should we remove HiveUtils.isHive23 and related changes? I see it was added when we add Hive 2.3 support at 33f3c48#diff-842e3447fc453de26c706db1cac8f2c4R59. cc @wangyum FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu should we remove
Line 41 in 55ce49e
| override def isHive23OrSpark: Boolean = HiveUtils.isHive23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we can remove HiveUtils.isHive23 and related changes.
| @@ -28,14 +28,11 @@ import org.apache.spark.sql.hive.test.TestHive | |||
| * {{{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can now remove this comment d7755cf#diff-10e604a9a9d9c4bcc9cdc01049851095R170
and enable this test d7755cf#diff-10e604a9a9d9c4bcc9cdc01049851095R201. Feel free to ignore back if it fails for whatever reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we can just remove this comment too:
spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala
Lines 171 to 173 in 6be8b93
| // Since Hive 1.2.1 library code path still has this problem, users may hit this | |
| // when spark.sql.hive.convertMetastoreOrc=false. However, after SPARK-22279, | |
| // Apache Spark with the default configuration doesn't hit this bug. |
|
Thank you so much for review, @HyukjinKwon and @wangyum . It's difficult to remove all conditional logics / comments / website in this PR. We have more instances in the test cases, too. I want to clean up them in another PR or a follow-up carefully. Is it okay for you guys? This PR focuses on removing |
dbtsai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 on cleaning up the rest of Hive1.2 code in followup PR as it's everywhere, and takes time to carefully remove them.
pom.xml
Outdated
| <id>hive-1.2</id> | ||
| <properties> | ||
| <hive.group>org.spark-project.hive</hive.group> | ||
| <hive.classifier></hive.classifier> | ||
| <!-- Version used in Maven Hive dependency --> | ||
| <hive.version>1.2.1.spark2</hive.version> | ||
| <!-- Version used for internal directory structure --> | ||
| <hive.version.short>1.2</hive.version.short> | ||
| <hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope> | ||
| <hive.storage.version>2.6.0</hive.storage.version> | ||
| <hive.storage.scope>provided</hive.storage.scope> | ||
| <hive.common.scope>provided</hive.common.scope> | ||
| <hive.llap.scope>provided</hive.llap.scope> | ||
| <hive.serde.scope>provided</hive.serde.scope> | ||
| <hive.shims.scope>provided</hive.shims.scope> | ||
| <orc.classifier>nohive</orc.classifier> | ||
| <datanucleus-core.version>3.2.10</datanucleus-core.version> | ||
| </properties> | ||
| <!-- Exists only for backward compatibility. No-op. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this profile is still useful after we remove all the hive-1.2 stuffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's just that you get an error if you specify -Phive-1.2 if it's removed entirely. Now, maybe that's a good thing? previously for Scala profiles I had left them in because, for example -Pscala-2.12 was not the default before. When it became the default, that would have caused an error even though it was already set up for 2.12. But this case is probably different: someone selecting Hive 1.x support should see an error, probably? in which case this should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this sounds more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Looks OK and the tests/Github Actions were passed. It is okay to clean up other Hive 1.2 related code in next PRs. |
|
Thank you all. Merged to master for Apache Spark 3.1.0. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Yeah, followup is fine. LGTM. |
|
Test build #129424 has finished for PR 29936 at commit
|
What changes were proposed in this pull request?
As of today,
masterbranch) removed Hive 1.2 related artifacts from Apache Spark binary distributions.This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0.
For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it.
Why are the changes needed?
1.2.1.spark2exposed many unfixable bugs in Apache because the forked1.2.1.spark2is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with1.2.1.spark2. Many bugs still exist inhive-1.2profile and new Apache Spark unit tests are added withHiveUtils.isHive23condition so far.Does this PR introduce any user-facing change?
No. This is a dev-only change. PRBuilder will not accept
[test-hive1.2]on master andbranch-3.1.How was this patch tested?