-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34210][SQL] After upgrading 3.0.1, Spark SQL access hive on HBase table access exception #31302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
| createNewHadoopRDD(localTableDesc, inputPathStr) | ||
| } else { | ||
| if (classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz)) { | ||
| createOldHadoopRDD(localTableDesc, inputPathStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to suite for this change and explain the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me see how to write it
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, if it's assignable to both, why does Apache Spark need to use old one? Instead, it sounds like Hivehbasetableinputformat seems to miss the correct implementation for mapreduce.InputFormat. This doesn't look like a Spark issue to me. Is there an Hive JIRA issue for that?
Hivehbasetableinputformat relies on two versions of inputformat,one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes both conditions to be true:
classOf[oldInputClass[_, ]].isAssignableFrom(inputFormatClazz) is true
classOf[newInputClass[, _]].isAssignableFrom(inputFormatClazz) is true
In view of this situation,It is expected to be compatible with the old version first
|
Is this a duplicate of #29178 and #31147? I have the same question (#29178 (comment)), and I agree with @dongjoon-hyun's. |
I think in order to be compatible with implementation classes similar to 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' that both uses the old API and the new API, it should be prioritized whether to create the old one. |
|
@yangBottle do you have an official answer from Apache Hadoop community or HBase community that we should look up old ones first? |
|
And, you're basically saying |
No,This is just a personal opinion.And I debug the source code, I found that some initialization operations of hbase are done in the interface implementation of the old api, creating NewHadoopRDD will get an empty hbase table instance.So I think it should look up old ones first. |
|
@HyukjinKwon I still think that priority should be given to finding the old ones. This problem involves changes in Hadoop, HBase and hive. The upgrade cost for the user cluster environment is higher. On the contrary, spark is more lightweight |
|
How should I apply this change |
Change it to the code in PR and recompile it |
Hivehbasetableinputformat relies on two versions of inputformat,one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes both conditions to be true:
In view of this situation,It is expected to be compatible with the old version first