[SPARK-34210][SQL] After upgrading 3.0.1, Spark SQL access hive on HBase table access exception #31302

zzccctv · 2021-01-23T04:43:17Z

Hivehbasetableinputformat relies on two versions of inputformat，one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes both conditions to be true:

classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
In view of this situation,It is expected to be compatible with the old version first

…ding 3.0.1

AmplabJenkins · 2021-01-23T05:14:03Z

Can one of the admins verify this patch?

yikf · 2021-01-23T06:51:29Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

-      createNewHadoopRDD(localTableDesc, inputPathStr)
-    } else {
+    if (classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
      createOldHadoopRDD(localTableDesc, inputPathStr)


Can you add a test to suite for this change and explain the error?

OK, let me see how to write it

dongjoon-hyun

Hi, if it's assignable to both, why does Apache Spark need to use old one? Instead, it sounds like Hivehbasetableinputformat seems to miss the correct implementation for mapreduce.InputFormat. This doesn't look like a Spark issue to me. Is there an Hive JIRA issue for that?

Hivehbasetableinputformat relies on two versions of inputformat，one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes both conditions to be true:
classOf[oldInputClass[_, ]].isAssignableFrom(inputFormatClazz) is true
classOf[newInputClass[, _]].isAssignableFrom(inputFormatClazz) is true
In view of this situation,It is expected to be compatible with the old version first

HyukjinKwon · 2021-01-24T03:28:37Z

Is this a duplicate of #29178 and #31147? I have the same question (#29178 (comment)), and I agree with @dongjoon-hyun's.

yangBottle · 2021-01-25T02:05:50Z

Hi, if it's assignable to both, why does Apache Spark need to use old one? Instead, it sounds like Hivehbasetableinputformat seems to miss the correct implementation for mapreduce.InputFormat. This doesn't look like a Spark issue to me. Is there an Hive JIRA issue for that?

Hivehbasetableinputformat relies on two versions of inputformat，one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes both conditions to be true:
classOf[oldInputClass[_, ]].isAssignableFrom(inputFormatClazz) is true
classOf[newInputClass[, _]].isAssignableFrom(inputFormatClazz) is true
In view of this situation,It is expected to be compatible with the old version first

I think in order to be compatible with implementation classes similar to 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' that both uses the old API and the new API, it should be prioritized whether to create the old one.

HyukjinKwon · 2021-01-25T02:23:06Z

@yangBottle do you have an official answer from Apache Hadoop community or HBase community that we should look up old ones first?

HyukjinKwon · 2021-01-25T02:25:06Z

And, you're basically saying Hivehbasetableinputformat's mapreduce implementation is unable to use, and it will be an issue in that code.

yangBottle · 2021-01-25T03:21:50Z

@yangBottle do you have an official answer from Apache Hadoop community or HBase community that we should look up old ones first?

And, you're basically saying Hivehbasetableinputformat's mapreduce implementation is unable to use, and it will be an issue in that code.

No,This is just a personal opinion.And I debug the source code, I found that some initialization operations of hbase are done in the interface implementation of the old api, creating NewHadoopRDD will get an empty hbase table instance.So I think it should look up old ones first.

zzccctv · 2021-01-25T11:36:22Z

@HyukjinKwon I still think that priority should be given to finding the old ones. This problem involves changes in Hadoop, HBase and hive. The upgrade cost for the user cluster environment is higher. On the contrary, spark is more lightweight

xza-m · 2022-09-05T11:28:45Z

How should I apply this change

zzccctv · 2022-09-15T00:35:42Z

How should I apply this change

Change it to the code in PR and recompile it

solve the problem of hive on HBase table access exception after upgra…

787c1df

…ding 3.0.1

github-actions bot added the SQL label Jan 23, 2021

yikf reviewed Jan 23, 2021

View reviewed changes

dongjoon-hyun reviewed Jan 23, 2021

View reviewed changes

HyukjinKwon closed this Jan 24, 2021

[SPARK-34210][SQL] After upgrading 3.0.1, Spark SQL access hive on HBase table access exception #31302

[SPARK-34210][SQL] After upgrading 3.0.1, Spark SQL access hive on HBase table access exception #31302

Uh oh!

Conversation

zzccctv commented Jan 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented Jan 23, 2021

Uh oh!

yikf Jan 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zzccctv Jan 23, 2021

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 24, 2021

Uh oh!

yangBottle commented Jan 25, 2021

Uh oh!

HyukjinKwon commented Jan 25, 2021

Uh oh!

HyukjinKwon commented Jan 25, 2021

Uh oh!

yangBottle commented Jan 25, 2021

Uh oh!

zzccctv commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xza-m commented Sep 5, 2022

Uh oh!

zzccctv commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zzccctv commented Jan 23, 2021 •

edited

Loading

yikf Jan 23, 2021 •

edited

Loading

zzccctv commented Jan 25, 2021 •

edited

Loading