Skip to content

Conversation

@bizybot
Copy link
Contributor

@bizybot bizybot commented Aug 13, 2018

This commit adds troubleshooting section for Kerberos.
Most of the times the problems seen are caused due to invalid
configurations like keytab missing principals or credentials
not up to date. Time synchronization is an important part for
Kerberos infrastructure and the time skew can cause problems.
To debug further documentation explains how to enable JAAS
Kerberos login module debugging and Kerberos/SPNEGO debugging
by setting JVM system properties.

This commit adds troubleshooting section for Kerberos.
Most of the times the problems seen are caused due to invalid
configurations like keytab missing principals or credentials
not up to date. Time synchronization is important part for
Kerberos infrastructure and the time skew can cause problems.
To debug further documentation explains how to enable JAAS
Kerberos login module debugging and Kerberos/SPNEGO debugging
by setting JVM system propertoes.
@bizybot bizybot added review v7.0.0 :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 v6.5.0 labels Aug 13, 2018
@bizybot bizybot requested review from jaymode, lcawl and tvernum August 13, 2018 07:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs

@bizybot bizybot changed the title [Kerberos] Add troubleshooting documentation for Kerberos [Kerberos] Add troubleshooting documentation Aug 13, 2018

* User authentication fails due to either GSS negotiation failure
or a service login failure on the server side or in the {es} HTTP client. Some of
the common exceptions are listed below with some tips to resolve them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the first sentence correctly, I think it should be changed to something like this: "User authentication fails due to either a GSS negotiation failure or a service login failure (on the server or the {es} HTTP client)".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Thank you.


*Resolution:*

`Failure unspecified at GSS-API level (Mechanism level: Checksum failed)`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these error messages in specific logs or on specific machines in the deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated with some more information as it can log message on the server side and an error message on the client. Thank you.

--

When this occurs on HTTP client side, it may be related to an incorrect password.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the subsequent paragraphs related to when this occurs on the server side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have separated this into client and server and yes the subsequent paragraphs are related to server-side except the last one which talks about hostname resolution which is common to both. Thank you.

+
--

To prevent replay attacks, Kerberos V5 sets maximum tolerance for computer clock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sets a maximum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Thank you.

--

To prevent replay attacks, Kerberos V5 sets maximum tolerance for computer clock
synchronization and it is usually set to 5 minutes. Check whether the time on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is typically 5 minutes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Thank you.


For detailed information, see {ref}/security-settings.html#ref-kerberos-settings[Kerberos realm settings].

To enable Kerberos logging on JVM, add following JVM system properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some context. Like when and why would you do this. How does it differ from the login module debug log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some context about what does this additional logging provide as in GSS context negotiation logs and Kerberos exchange messages. Thank you.


Kerberos depends on proper hostname resolution, so please check your DNS infrastructure.
Incorrect DNS setup, DNS SRV records or configuration for KDC servers in `krb5.conf`
+can cause problems with hostname resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this + intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, Removed it. Thanks.


As Kerberos logs are often cryptic in nature and many things can go wrong
as it depends on external services like DNS, time synchronization. You might
have to enable additional debug logs to root cause the issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "root cause" is a verb 😈

Maybe "... debug logs to determine the root cause of the issue."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Tim. Addressed this.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some feedback. After those are addressed, LGTM

*Symptoms:*

* User authentication fails due to either GSS negotiation failure
or a service login failure ( either on the server or in the {es} http client).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit remove space between ( and e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, addressed this.


* User authentication fails due to either GSS negotiation failure
or a service login failure ( either on the server or in the {es} http client).
Some of the common exceptions are listed below with some tips to resolve them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/some tips to resolve/tips to help resolve

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, thank you.

--

As Kerberos logs are often cryptic in nature and many things can go wrong
as it depends on external services like DNS, time synchronization. You might
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DNS, time synchronization/DNS and NTP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this, Thank you.

as it depends on external services like DNS, time synchronization. You might
have to enable additional debug logs to determine the root cause of the issue.

{es} uses JAAS (Java Authentication and Authorization Service) Kerberos login
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/uses JAAS/uses a JAAS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this, Thank you

For detailed information, see {ref}/security-settings.html#ref-kerberos-settings[Kerberos realm settings].

Sometimes you may need to go deeper to understand the problem during SPNEGO
gss context negotiation or look at the Kerberos message exchange. To enable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gss/GSS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this, Thank you

@bizybot bizybot merged commit 38886e8 into elastic:master Aug 21, 2018
bizybot added a commit that referenced this pull request Aug 21, 2018
This commit adds troubleshooting section for Kerberos.
Most of the times the problems seen are caused due to invalid
configurations like keytab missing principals or credentials
not up to date. Time synchronization is an important part for
Kerberos infrastructure and the time skew can cause problems.
To debug further documentation explains how to enable JAAS
Kerberos login module debugging and Kerberos/SPNEGO debugging
by setting JVM system properties.
bizybot added a commit that referenced this pull request Aug 21, 2018
This commit adds troubleshooting section for Kerberos.
Most of the times the problems seen are caused due to invalid
configurations like keytab missing principals or credentials
not up to date. Time synchronization is an important part for
Kerberos infrastructure and the time skew can cause problems.
To debug further documentation explains how to enable JAAS
Kerberos login module debugging and Kerberos/SPNEGO debugging
by setting JVM system properties.
@bizybot bizybot deleted the troubleshooting-kerberos branch August 21, 2018 06:35
@lcawl lcawl added the >docs General docs changes label Aug 21, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 v6.5.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants