Skip to content

Conversation

@hddong
Copy link
Contributor

@hddong hddong commented Mar 4, 2019

…create table through spark-sql or beeline

What changes were proposed in this pull request?

fix table owner use user instead of principal when create table through spark-sql
private val userName = conf.getUser will get ugi's userName which is principal info, and i copy the source code into HiveClientImpl, and use ugi.getShortUserName() instead of ugi.getUserName(). The owner display correctly.

How was this patch tested?

  1. create a table in kerberos cluster
  2. use "desc formatted tbName" check owner

Please review http://spark.apache.org/contributing.html before opening a pull request.

@hddong
Copy link
Contributor Author

hddong commented Mar 4, 2019

link to #23837
@HyukjinKwon , i created a new PR, please review here, thanks.

@hddong
Copy link
Contributor Author

hddong commented Mar 8, 2019

@HyukjinKwon @felixcheung @vanzin , could you help to review this?

@felixcheung
Copy link
Member

felixcheung commented Mar 10, 2019 via email

@HyukjinKwon
Copy link
Member

cc @dongjoon-hyun FYI.

@hddong
Copy link
Contributor Author

hddong commented Mar 17, 2019

@dongjoon-hyun could you help to review this?

@hddong
Copy link
Contributor Author

hddong commented Mar 20, 2019

@rxin , @cloud-fan could you help to review?

@cloud-fan
Copy link
Contributor

ok to test

private val userName = conf.getUser
private val userName: String = try {
val ugi = HiveUtils.getUGI
ugi.getShortUserName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we prove this is the right fix? What if there is no UGI?

Copy link
Contributor Author

@hddong hddong Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan , It will return system user information if there is no ugi. Actually, the source code of conf.getUser is:

public String getUser() throws IOException {
        try {
             UserGroupInformation ugi = Utils.getUGI(); 
             return ugi.getUserName();
         } catch (LoginException var2) { 
             throw new IOException(var2); 
         } 
     }

this change just use ugi.getShortUserName instead of ugi.getUserName()

Copy link
Contributor

@HeartSaVioR HeartSaVioR Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Utils is accessible from here, so if we just want to leverage Hive code and call getShortUserName instead, we can just do it with one-liner.

private val userName = org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName

Here I intended to not catch LoginException as we don't do anything but throw it.

Copy link
Contributor Author

@hddong hddong Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName

Yep, but shims.Utils is not compatible with hive0.*, and cannot pass all tests. This file is added after hive1.*, so, copy the source code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Didn't recognize it doesn't match with some versions. Makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f1eb8cf was for this, sorry I had to track down commits as well.

@SparkQA
Copy link

SparkQA commented Mar 21, 2019

Test build #103790 has finished for PR 23952 at commit 2609c9c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hddong
Copy link
Contributor Author

hddong commented Mar 26, 2019

@cloud-fan, how to fix error 'fails Scala style tests ', should i move code to function? Could you give some suggest.

@hddong
Copy link
Contributor Author

hddong commented Apr 17, 2019

cc @rxin , @cloud-fan, @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108151 has finished for PR 23952 at commit 05e54fe.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor

Only fix table owner is not enough. we should create table will the right user not just change table owner

@hddong
Copy link
Contributor Author

hddong commented Jul 25, 2019

Only fix table owner is not enough. we should create table will the right user not just change table owner

Just owner display error information, other is correct.

@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108153 has finished for PR 23952 at commit bdcc9fa.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108154 has finished for PR 23952 at commit c4e3468.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108158 has finished for PR 23952 at commit e8c0571.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

conflict of hiveUtils
@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108161 has finished for PR 23952 at commit c27d4af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2019

Test build #108189 has finished for PR 23952 at commit f1eb8cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hddong
Copy link
Contributor Author

hddong commented Jul 26, 2019

cc @rxin , @cloud-fan, @dongjoon-hyun

@hddong
Copy link
Contributor Author

hddong commented Aug 1, 2019

@rxin , @cloud-fan, @dongjoon-hyun could you help to review?

@HeartSaVioR
Copy link
Contributor

I'm addressing test part as #25696 - I didn't raise a PR against this PR as the change was more than what I expected. This patch can reflect the change (change to shortUserName) once #25696 is merged.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Sep 6, 2019

FYI, #17311 explained why this should be applied.

In the kerberized hadoop cluster, when Spark creates tables, the owner of tables are filled with PRINCIPAL strings instead of USER names. This is inconsistent with Hive and causes problems when using ROLE in Hive. We had better to fix this.

Let me try to ping again cc. @cloud-fan @gatorsmile @dongjoon-hyun

}
ugi.getShortUserName
} catch {
case _: LoginException => throw new LoginException("Can not get login user.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? Just let the original exception propagate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the previous comment, I guess the origin code was IOException wrapped LoginException:
https://github.com/apache/spark/pull/23952/files#r268466632

so if we want to follow the origin code, use IOException, otherwise, just remove try/catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin @HeartSaVioR
In hive source code, it throw two exception: LoginException, IOException. I wonder if Exception is ok here.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.

Hive source code has to do something (like wrapping as you've seen) because both LoginException and IOException are checked exceptions - given you're coding in Scala you can forget about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.

Hive source code has to do something (like wrapping as you've seen) because both LoginException and IOException are checked exceptions - given you're coding in Scala you can forget about that.

Yes, you are right, it will fail with logging exception. Changed it.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110401 has finished for PR 23952 at commit 6011598.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

} catch {
case e: Exception =>
logWarning("Can not get login user.")
logError("Can not get login user.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here: I'm not sure we really want to log error message here, as I expect thrown exception will make query fail somewhere in caller side, and it will log the exception correctly - stack trace in exception will tell us.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110412 has finished for PR 23952 at commit 76192b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110416 has finished for PR 23952 at commit 76192b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hddong
Copy link
Contributor Author

hddong commented Sep 11, 2019

@HeartSaVioR
An unrelated question, why most tests failed, and passed all after your reset.

@HeartSaVioR
Copy link
Contributor

Hive test seems to be a bit flaky. So the Spark community tries to fix the flay tests as many as possible, but there're many cases Spark community can't fix the flakiness. Rerunning build doesn't matter - if the test is failing consistently that's the matter.

@hddong
Copy link
Contributor Author

hddong commented Sep 11, 2019

Hive test seems to be a bit flaky. So the Spark community tries to fix the flay tests as many as possible, but there're many cases Spark community can't fix the flakiness. Rerunning build doesn't matter - if the test is failing consistently that's the matter.

Thanks.

@SparkQA
Copy link

SparkQA commented Sep 11, 2019

Test build #110451 has finished for PR 23952 at commit a8d6b1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private val userName = conf.getUser
private val userName = {
val doAs = sys.env.get("HADOOP_USER_NAME").orNull
val ugi = if (doAs != null && doAs.length() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong or, at the very least, backwards: the current UGI should be preferred.

What's the goal of this code? Why can't you just get the current UGI's name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, at least that is not a regression, as conf.getUser calls Hive side and Hive does it in Utils.getUGI, at least from Hive 1.2.1.spark2.
So the code is copied and pasted to modify calling ugi.getShortUserName() instead of ugi.getUserName() - @hddong left a comment before to explain why the code had to be copied and pasted - #23952 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and there's a huge comment in the Hive method (at least in the branch I'm looking at) that explains why that is done. And if you read that comment you'll see that it does not apply here.

In a way, the HMS API is broken in that it lets the caller set the owner (instead of getting it from the auth info).

But we really should at least try to get the correct information through, and that comes from the UGI, not from env variables.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually I just read decompiled code of Utils.getUGI() (missed to read comment in source) and didn't indicate the intention - let other application be able to pass it. Yes I agree it's not needed in Spark side, and it would be weird HADOOP_USER_NAME is only used here and undocumented. Thanks for explaining!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you two, I missed to read comment in source too. Yes, current UGI is ok here, SPARK-22846 Fix table owner is null when creating table through spark sql or thriftserver, but lead to an issue that have occurred(owner is principal in kerberized cluster).

@SparkQA
Copy link

SparkQA commented Sep 12, 2019

Test build #110496 has finished for PR 23952 at commit 2f37513.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vanzin
Copy link
Contributor

vanzin commented Sep 16, 2019

Merging to master.

@vanzin vanzin closed this in 5881871 Sep 16, 2019
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HeartSaVioR
Copy link
Contributor

Thanks again @hddong for making this patch - pretty useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants