Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

This PR is an enhanced version of #25805 so I've kept the original text. The problem with the original PR can be found in comment.

This situation can happen when an external system (e.g. Oozie) generates
delegation tokens for a Spark application. The Spark driver will then run
against secured services, have proper credentials (the tokens), but no
kerberos credentials. So trying to do things that requires a kerberos
credential fails.

Instead, if no kerberos credentials are detected, just skip the whole
delegation token code.

Tested with an application that simulates Oozie; fails before the fix,
passes with the fix. Also with other DT-related tests to make sure other
functionality keeps working.

@gaborgsomogyi
Copy link
Contributor Author

The problem with the original code was the following:

UserGroupInformation.setConfiguration excepts the default kerberos realm to be set which can be made in krb5.conf. This can be passed to the JVM 2 ways:

  • Create /etc/krb5.conf, which loaded by default
  • Set java.security.krb5.conf on the JVM which points to the krb5.conf file

If none of them applies on the machine where this test executed then it will fail constantly.

The reason why later tests failed:

UserGroupInformation.setConfiguration was set kerberos authentication but it has thrown the following exception half-way:

Can't get Kerberos realm
java.lang.IllegalArgumentException: Can't get Kerberos realm

Because UserGroupInformation.setConfiguration was outside of the try block UserGroupInformation.reset() was not executed in the finally block so kerberos authentication remained for the upcoming tests and made several of them fail.

@gaborgsomogyi
Copy link
Contributor Author

There are basically 3 ways to set krb5.conf:

  • Start MiniKdc instance which creates a krb5.conf file and point it with java.security.krb5.conf => I've chosen this one because MiniKdc follows krb5.conf file format
  • Create krb5.conf file manually from code and point it with java.security.krb5.conf => I thought it's error prone to create file content manually. I'm pretty sure krb5.conf file format is stable enough but in case of it changes...
  • Create /etc/krb5.conf => This is silly idea since test execution should be independent of the environment

// krb5.conf. MiniKdc sets "java.security.krb5.conf" in start and removes it when stop called.
val kdcDir = Utils.createTempDir()
val kdcConf = MiniKdc.createConf()
kdc = new MiniKdc(kdcConf, kdcDir)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change1: I've added MiniKdc here to set krb5.conf.

val krbConf = new Configuration()
krbConf.set(HADOOP_SECURITY_AUTHENTICATION, "kerberos")

UserGroupInformation.setConfiguration(krbConf)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change2: I've pulled UserGroupInformation.setConfiguration inside the try block.

Array.empty)
proxyUser.doAs(testImpl)
} finally {
if (kdc != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change3: Stop MiniKdc.

@gaborgsomogyi
Copy link
Contributor Author

Once the test pass I'm going to start it with [test-hadoop3.2][test-java11].

@gaborgsomogyi
Copy link
Contributor Author

cc @dongjoon-hyun @srowen @squito

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111205 has finished for PR 25901 at commit 330b3b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi gaborgsomogyi changed the title [SPARK-29082][core] Skip delegation token generation if no credentials are available. [SPARK-29082][core][test-hadoop3.2][test-java11] Skip delegation token generation if no credentials are available. Sep 23, 2019
@gaborgsomogyi
Copy link
Contributor Author

retest this please

@gaborgsomogyi
Copy link
Contributor Author

The previously problematic execution now passed:

...
[info] - SPARK-29082: do not fail if current user does not have credentials (1 second, 116 milliseconds)
...

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111220 has finished for PR 25901 at commit 330b3b5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Sep 23, 2019

lgtm, confirmed this version passes for me locally (the old pr did not).

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111229 has finished for PR 25901 at commit 2566ddf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 23, 2019

The failure looks irrelevant to this PR.

[info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (182 milliseconds)
[info]   java.lang.StackOverflowError did not equal null (LocalityPlacementStrategySuite.scala:48)

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111236 has finished for PR 25901 at commit 2566ddf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111237 has finished for PR 25901 at commit 2566ddf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

flakyness hell

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111242 has finished for PR 25901 at commit 2566ddf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @vanzin since he is the original author.

@HeartSaVioR
Copy link
Contributor

Let's try to track the test failures: otherwise we will be stuck on sums of probability of flaky tests.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Sep 24, 2019

handle large number of containers and tasks (SPARK-18750) *** FAILED *** (182 milliseconds)

https://issues.apache.org/jira/browse/SPARK-29220

SQLQueryTestSuite.sql (subquery/scalar-subquery/scalar-subquery-select.sql)

https://issues.apache.org/jira/browse/SPARK-29221

pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests.test_parameter_convergence

https://issues.apache.org/jira/browse/SPARK-29222

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29082][core][test-hadoop3.2][test-java11] Skip delegation token generation if no credentials are available. [SPARK-29082][CORE] Skip delegation token generation if no credentials are available Sep 24, 2019
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @gaborfeher , @srowen , @squito , @HeartSaVioR .
Merged to master.

@vanzin
Copy link
Contributor

vanzin commented Sep 25, 2019

Hey guys, I was out for the last few days, thanks for taking care of it.

(Original PR seemed to pass tests here, but I did notice a failure internally in one of our internal branches, and asked people to pay attention to it...)

@gaborgsomogyi
Copy link
Contributor Author

Thanks guys for the help and taking care of the fix!

@koertkuipers
Copy link
Contributor

koertkuipers commented Sep 27, 2019

i ran into this PR when building master for hadoop 2.7

$ dev/make-distribution.sh --name blah --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Pyarn -Phadoop-provided

[INFO] --- scala-maven-plugin:4.2.0:testCompile (scala-test-compile-first) @ spark-core_2.12 ---
[INFO] Using incremental compilation using Mixed compile order                                           
[INFO] Compiling 262 Scala sources and 27 Java sources to /home/koert/src/spark/core/target/scala-2.12/test-classes ...
[ERROR] [Error] /home/koert/src/spark/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala:119: method reset in class UserGroupInformation cannot be accessed in object org.apache.hadoop.security.UserGroupInformation                     
[ERROR] one error found               

@koertkuipers
Copy link
Contributor

koertkuipers commented Sep 27, 2019

oh sorry nevermind. i think its because i was using hadoop 2.7.0 instead of 2.7.4. disregard... i just needed to update an old automated script of ours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants