-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23209][core] Allow credential manager to work when Hive not available. #20399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ailable. The JVM seems to be doing early binding of classes that the Hive provider depends on, causing an error to be thrown before it was caught by the code in the class. The fix wraps the creation of the provider in a try..catch so that the provider can be ignored when dependencies are missing. Added a unit test (which fails without the fix), and also tested that getting tokens still works in a real cluster.
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, otherwise lgtm
|
|
||
| test("SPARK-23209: obtain tokens when Hive classes are not available") { | ||
| // This test needs a custom class loader to hide Hive classes which are in the classpath. | ||
| // Because the manager code loads the Hive provider directly instead of using reflection, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my understandning, is there any reason the manager should load the code directly, rather than using reflection to guard against this? I guess either way is fine, I just had seen us use reflection more to guard against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-reflection code is easier to follow and change if needed. It makes error handling a little more complicated when classes are missing (this PR being that example), but overall I prefer that to reflection.
| /** Test code for SPARK-23209 to avoid using too much reflection above. */ | ||
| private object NoHiveTest extends Matchers { | ||
|
|
||
| def main(args: Array[String]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor: can you name this something other than "main"? it makes it seem like you're launching it as seperate process (maybe leftover from earlier attempt?)
|
So seems |
|
Some of that could be cleaned up, but more exceptions are being caught there in different method calls, so it still can help. |
|
Originally we were using reflection for this |
|
Because the code is cleaner that way. |
|
Test build #86661 has finished for PR 20399 at commit
|
|
Test build #86666 has finished for PR 20399 at commit
|
|
LGTM. |
| Some(createFn) | ||
| } catch { | ||
| case t: Throwable => | ||
| logDebug(s"Failed to load built in provider.", t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this should be a warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info probable - it is actually ok for this to fail if provider is not relevant (and in classpath).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think debug is right, actually -- we have no idea at this point if the user wants these credential providers, and it could be totally fine if they're missing eg. if they never want to talk to hive.
(also don't really care that much and don't want to bike-shed on this ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this really should be debug. This only covers exceptions thrown by the constructor, which really shouldn't be doing anything (and this code is just needed because of the class linkage issue).
Individual methods like obtainDelegationTokens should be the ones reporting user-actionable issues, and even them today kinda log most things at debug level...
|
@vanzin and reviewers -- is this ready to go? We're waiting on RC3 for this. Thanks! |
|
test this please |
|
I was hoping that one of the other committers who +1'ed the patch would push it instead of me. (Ignoring the info vs. debug discussion.) |
|
I am merging this now to master & 2.3 |
…ailable. The JVM seems to be doing early binding of classes that the Hive provider depends on, causing an error to be thrown before it was caught by the code in the class. The fix wraps the creation of the provider in a try..catch so that the provider can be ignored when dependencies are missing. Added a unit test (which fails without the fix), and also tested that getting tokens still works in a real cluster. Author: Marcelo Vanzin <[email protected]> Closes #20399 from vanzin/SPARK-23209. (cherry picked from commit b834446) Signed-off-by: Imran Rashid <[email protected]>
|
Test build #86780 has finished for PR 20399 at commit
|
The JVM seems to be doing early binding of classes that the Hive provider
depends on, causing an error to be thrown before it was caught by the code
in the class.
The fix wraps the creation of the provider in a try..catch so that
the provider can be ignored when dependencies are missing.
Added a unit test (which fails without the fix), and also tested
that getting tokens still works in a real cluster.