Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

What changes were proposed in this pull request?

KafkaDelegationTokenSuite fails on different platforms with the following problem:

19/09/11 11:07:42.690 pool-1-thread-1-SendThread(localhost:44965) DEBUG ZooKeeperSaslClient: creating sasl client: Client=zkclient/[email protected];service=zookeeper;serviceHostname=localhost.localdomain
...
NIOServerCxn.Factory:localhost/127.0.0.1:0: Zookeeper Server failed to create a SaslServer to interact with a client during session initiation:
javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]
	at com.sun.security.sasl.gsskerb.GssKrb5Server.<init>(GssKrb5Server.java:125)
	at com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
	at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
	at org.apache.zookeeper.util.SecurityUtils$2.run(SecurityUtils.java:233)
	at org.apache.zookeeper.util.SecurityUtils$2.run(SecurityUtils.java:229)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.zookeeper.util.SecurityUtils.createSaslServer(SecurityUtils.java:228)
	at org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:44)
	at org.apache.zookeeper.server.ZooKeeperSaslServer.<init>(ZooKeeperSaslServer.java:38)
	at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:100)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:186)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:227)
	at java.lang.Thread.run(Thread.java:748)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)
	at sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
	at sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
	at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
	at sun.security.jgss.GSSCredentialImpl.<init>(GSSCredentialImpl.java:62)
	at sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
	at com.sun.security.sasl.gsskerb.GssKrb5Server.<init>(GssKrb5Server.java:108)
	... 13 more
NIOServerCxn.Factory:localhost/127.0.0.1:0: Client attempting to establish new session at /127.0.0.1:33742
SyncThread:0: Creating new log file: log.1
SyncThread:0: Established session 0x100003736ae0000 with negotiated timeout 10000 for client /127.0.0.1:33742
pool-1-thread-1-SendThread(localhost:35625): Session establishment complete on server localhost/127.0.0.1:35625, sessionid = 0x100003736ae0000, negotiated timeout = 10000
pool-1-thread-1-SendThread(localhost:35625): ClientCnxn:sendSaslPacket:length=0
pool-1-thread-1-SendThread(localhost:35625): saslClient.evaluateChallenge(len=0)
pool-1-thread-1-EventThread: zookeeper state changed (SyncConnected)
NioProcessor-1: No server entry found for kerberos principal name zookeeper/[email protected]
NioProcessor-1: No server entry found for kerberos principal name zookeeper/[email protected]
NioProcessor-1: Server not found in Kerberos database (7)
NioProcessor-1: Server not found in Kerberos database (7)

The problem reproducible if the localhost and localhost.localdomain order exhanged:

[systest@gsomogyi-build spark]$ cat /etc/hosts
127.0.0.1   localhost.localdomain localhost localhost4 localhost4.localdomain4
::1         localhost.localdomain localhost localhost6 localhost6.localdomain6

The main problem is that ZkClient connects to the canonical loopback address (which is not necessarily localhost).

Why are the changes needed?

KafkaDelegationTokenSuite failed in some environments.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests on different platforms.

@gaborgsomogyi
Copy link
Contributor Author

@koertkuipers may I ask to test this PR on your environment?

@HeartSaVioR
Copy link
Contributor

The code change looks good. Would like to see the result of CI build to ensure the change doesn't break existing ones. (CI build seems to be down right now.)

@gaborgsomogyi
Copy link
Contributor Author

Yeah, my intention is to test 3 ways (apart from my local clusters):

@koertkuipers
Copy link
Contributor

ok i will test

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110631 has finished for PR 25803 at commit 2d9b8df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

Kafka flakyness:
Caused by: sbt.ForkMain$ForkError: java.lang.AssertionError: assertion failed: Partition [topic-2, 0] metadata not propagated after timeout

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110664 has finished for PR 25803 at commit 2d9b8df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Same failure: it might be related to the change. Once the build fails for same reason it seems to be high likely due to the change.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110668 has finished for PR 25803 at commit 2d9b8df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@koertkuipers
Copy link
Contributor

this fixed works for my environment. thanks!

@HeartSaVioR
Copy link
Contributor

Glad to hear! If we run some more CI builds and verify they work without high flakiness, it should be good.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110677 has finished for PR 25803 at commit 2d9b8df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110707 has finished for PR 25803 at commit 2d9b8df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaborgsomogyi gaborgsomogyi changed the title [SPARK-29027][TESTS] KafkaDelegationTokenSuite fix when loopback canonical host name differs from localhost [SPARK-29027][TESTS][maven] KafkaDelegationTokenSuite fix when loopback canonical host name differs from localhost Sep 17, 2019
@gaborgsomogyi
Copy link
Contributor Author

Just to be on the safe side testing with maven as well

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110754 has finished for PR 25803 at commit 2d9b8df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi gaborgsomogyi changed the title [SPARK-29027][TESTS][maven] KafkaDelegationTokenSuite fix when loopback canonical host name differs from localhost [SPARK-29027][TESTS] KafkaDelegationTokenSuite fix when loopback canonical host name differs from localhost Sep 17, 2019
@gaborgsomogyi
Copy link
Contributor Author

So all the tests pass what I wanted. Thanks @koertkuipers and @HeartSaVioR

@vanzin
Copy link
Contributor

vanzin commented Sep 17, 2019

Merging to master.

@vanzin vanzin closed this in 71e7516 Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants