-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…create table through spark-sql or beeline
|
link to #23837 |
|
@HyukjinKwon @felixcheung @vanzin , could you help to review this? |
|
I think it’s best you find someone on the SQL side to review this.
|
|
cc @dongjoon-hyun FYI. |
|
@dongjoon-hyun could you help to review this? |
|
@rxin , @cloud-fan could you help to review? |
|
ok to test |
| private val userName = conf.getUser | ||
| private val userName: String = try { | ||
| val ugi = HiveUtils.getUGI | ||
| ugi.getShortUserName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we prove this is the right fix? What if there is no UGI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan , It will return system user information if there is no ugi. Actually, the source code of conf.getUser is:
public String getUser() throws IOException {
try {
UserGroupInformation ugi = Utils.getUGI();
return ugi.getUserName();
} catch (LoginException var2) {
throw new IOException(var2);
}
}
this change just use ugi.getShortUserName instead of ugi.getUserName()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Utils is accessible from here, so if we just want to leverage Hive code and call getShortUserName instead, we can just do it with one-liner.
private val userName = org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName
Here I intended to not catch LoginException as we don't do anything but throw it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName
Yep, but shims.Utils is not compatible with hive0.*, and cannot pass all tests. This file is added after hive1.*, so, copy the source code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. Didn't recognize it doesn't match with some versions. Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f1eb8cf was for this, sorry I had to track down commits as well.
|
Test build #103790 has finished for PR 23952 at commit
|
|
@cloud-fan, how to fix error 'fails Scala style tests ', should i move code to function? Could you give some suggest. |
|
cc @rxin , @cloud-fan, @dongjoon-hyun |
Modify exception
|
Test build #108151 has finished for PR 23952 at commit
|
|
Only fix table owner is not enough. we should create table will the right user not just change table owner |
Just owner display error information, other is correct. |
|
Test build #108153 has finished for PR 23952 at commit
|
|
Test build #108154 has finished for PR 23952 at commit
|
|
Test build #108158 has finished for PR 23952 at commit
|
conflict of hiveUtils
|
Test build #108161 has finished for PR 23952 at commit
|
|
Test build #108189 has finished for PR 23952 at commit
|
|
cc @rxin , @cloud-fan, @dongjoon-hyun |
|
@rxin , @cloud-fan, @dongjoon-hyun could you help to review? |
|
FYI, #17311 explained why this should be applied.
Let me try to ping again cc. @cloud-fan @gatorsmile @dongjoon-hyun |
| } | ||
| ugi.getShortUserName | ||
| } catch { | ||
| case _: LoginException => throw new LoginException("Can not get login user.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this? Just let the original exception propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the previous comment, I guess the origin code was IOException wrapped LoginException:
https://github.com/apache/spark/pull/23952/files#r268466632
so if we want to follow the origin code, use IOException, otherwise, just remove try/catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin @HeartSaVioR
In hive source code, it throw two exception: LoginException, IOException. I wonder if Exception is ok here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.
Hive source code has to do something (like wrapping as you've seen) because both LoginException and IOException are checked exceptions - given you're coding in Scala you can forget about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.
Hive source code has to do something (like wrapping as you've seen) because both
LoginExceptionandIOExceptionare checked exceptions - given you're coding in Scala you can forget about that.
Yes, you are right, it will fail with logging exception. Changed it.
|
Test build #110401 has finished for PR 23952 at commit
|
| } catch { | ||
| case e: Exception => | ||
| logWarning("Can not get login user.") | ||
| logError("Can not get login user.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here: I'm not sure we really want to log error message here, as I expect thrown exception will make query fail somewhere in caller side, and it will log the exception correctly - stack trace in exception will tell us.
|
Test build #110412 has finished for PR 23952 at commit
|
|
retest this, please |
|
Test build #110416 has finished for PR 23952 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@HeartSaVioR |
|
Hive test seems to be a bit flaky. So the Spark community tries to fix the flay tests as many as possible, but there're many cases Spark community can't fix the flakiness. Rerunning build doesn't matter - if the test is failing consistently that's the matter. |
Thanks. |
|
Test build #110451 has finished for PR 23952 at commit
|
| private val userName = conf.getUser | ||
| private val userName = { | ||
| val doAs = sys.env.get("HADOOP_USER_NAME").orNull | ||
| val ugi = if (doAs != null && doAs.length() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong or, at the very least, backwards: the current UGI should be preferred.
What's the goal of this code? Why can't you just get the current UGI's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, at least that is not a regression, as conf.getUser calls Hive side and Hive does it in Utils.getUGI, at least from Hive 1.2.1.spark2.
So the code is copied and pasted to modify calling ugi.getShortUserName() instead of ugi.getUserName() - @hddong left a comment before to explain why the code had to be copied and pasted - #23952 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and there's a huge comment in the Hive method (at least in the branch I'm looking at) that explains why that is done. And if you read that comment you'll see that it does not apply here.
In a way, the HMS API is broken in that it lets the caller set the owner (instead of getting it from the auth info).
But we really should at least try to get the correct information through, and that comes from the UGI, not from env variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah actually I just read decompiled code of Utils.getUGI() (missed to read comment in source) and didn't indicate the intention - let other application be able to pass it. Yes I agree it's not needed in Spark side, and it would be weird HADOOP_USER_NAME is only used here and undocumented. Thanks for explaining!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you two, I missed to read comment in source too. Yes, current UGI is ok here, SPARK-22846 Fix table owner is null when creating table through spark sql or thriftserver, but lead to an issue that have occurred(owner is principal in kerberized cluster).
|
Test build #110496 has finished for PR 23952 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merging to master. |
|
Can one of the admins verify this patch? |
|
Thanks again @hddong for making this patch - pretty useful. |
…create table through spark-sql or beeline
What changes were proposed in this pull request?
fix table owner use user instead of principal when create table through spark-sql
private val userName = conf.getUser will get ugi's userName which is principal info, and i copy the source code into HiveClientImpl, and use ugi.getShortUserName() instead of ugi.getUserName(). The owner display correctly.
How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.