Skip to content

Conversation

@weibxiao
Copy link
Contributor

@weibxiao weibxiao commented Jul 30, 2025

webrev.zip
NPE thrown from SASL GSSAPI impl when TLS is used with QOP auth-int against Active Directory.

When the exception is triggered, LDAP Connection will do "clean-up" operation and output stream get flushed and closed the context while GssKrb5Client is still wrapping the message, and tried to send the abandoned info to the client at line https://github.com/openjdk/jdk/blob/master/src/jdk.security.jgss/share/classes/com/sun/security/sasl/gsskerb/GssKrb5Base.java#L140. That's the reason to throw NPE.

The change is going to close socket and output stream in LdapClient.java. It would allow SASL client code to send the abandoned request to client; then dispose GSS context. This will avoid NPE to thrown at line 140 of GssKrb5Base.java.

No test file is attached for this MR since it needs Sasl LDAP server with security setup. Attached the updated webrev for the reference.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8362268: NPE thrown from SASL GSSAPI impl when TLS is used with QOP auth-int against Active Directory (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26566/head:pull/26566
$ git checkout pull/26566

Update a local copy of the PR:
$ git checkout pull/26566
$ git pull https://git.openjdk.org/jdk.git pull/26566/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26566

View PR using the GUI difftool:
$ git pr show -t 26566

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26566.diff

Using Webrev

Link to Webrev Comment

weibxiao added 4 commits July 30, 2025 15:57
…ed with QOP auth-int against Active Directory
…LS is used with QOP auth-int against Active Directory"

This reverts commit ea2d289.
…ed with QOP auth-int against Active Directory
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 30, 2025

👋 Welcome back wxiao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 30, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jul 30, 2025

@weibxiao The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk
Copy link

openjdk bot commented Aug 1, 2025

⚠️ @weibxiao This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@weibxiao weibxiao marked this pull request as ready for review August 7, 2025 15:33
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 7, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 7, 2025

Webrevs

@seanjmullan
Copy link
Member

The bug title says "Java 11+" but the affects version field also contains 8u401 - does it also affect 8u?

In general, I would avoid putting release versions in the title of the bug as the affects version field is the right place to add that info, so please remove "on Java 11+" from the title.

@seanjmullan
Copy link
Member

Since this fix in the security-libs area, I think the component and subcomponent should be changed to security-libs/javax.security.

Also, please add a "noreg-hard" label to the bug with a comment explaining why it is too hard to write a regression test.

@weibxiao weibxiao changed the title 8362268 : NPE thrown from SASL GSSAPI impl on Java 11+ when TLS is used with QOP auth-int against Active Directory 8362268 : NPE thrown from SASL GSSAPI impl when TLS is used with QOP auth-int against Active Directory Aug 20, 2025
@weibxiao
Copy link
Contributor Author

weibxiao commented Aug 20, 2025

Original report in OpenJDK mail list mentioned it could not be duplicated in jdk8, but actually the defect exists also in JDK8.

The failure only happened when Sasl.QOP set with the value of auth-int or auth-conf. It needs a ldap server with the setting of SASL authentication. It is not available in OpenJDK community.

@michael-o
Copy link

I am the reporter of this bug in the mailing list. @seanjmullan, yes it is also present in JDK 8. confirmed myself on HP-UX where the JDK is provided by HPE. They have either cherry-picked the faulty commit or used an already updated tree.

@weibxiao I do not fully understand this fix. It does not really fix the issue, does it? It converts one NPE into another. From my PoV the regression should be reverted and another, better fix should be employed.

As @wangweij writes here, let LDAP complete the abandonRequest() and then free resources.

@wangweij
Copy link
Contributor

My "here" was

Not an LDAP expert, but I see that abandonRequest() still wants to write into outStream. If the SASL/GSS context is already disposed by now what stream should this be? Should it be reverted back to the raw stream?

However, I'm not sure if this correct. This means the security guaranteed by the SASL layer is lost and I also don't know if the peer can parse it correctly.

@michael-o What have JDK 8 and ldapsearch done? Did they send error messages in the clear?

@weibxiao
Copy link
Contributor Author

I can not revert the previous change. It will close the unused sockets in JVM before next GC to clean them.

Once simple fix in application cod for this NPE is increasing buffer size by setting javax.security.sasl.maxbuffer in the context to overwrite the hard coded buffer size in AbstractSaslImpl.java.

This NPE is more or less related to the timing. The context was cleared earlier than output stream got flush, but the later code is actually running earlier, but completed later.

@michael-o
Copy link

My "here" was

Not an LDAP expert, but I see that abandonRequest() still wants to write into outStream. If the SASL/GSS context is already disposed by now what stream should this be? Should it be reverted back to the raw stream?

However, I'm not sure if this correct. This means the security guaranteed by the SASL layer is lost and I also don't know if the peer can parse it correctly.

@michael-o What have JDK 8 and ldapsearch done? Did they send error messages in the clear?

I will get back to you tomorrow or Monday as soon as I have access to the environment.

@michael-o
Copy link

I can not revert the previous change. It will close the unused sockets in JVM before next GC to clean them.

I see.

Once simple fix in application cod for this NPE is increasing buffer size by setting javax.security.sasl.maxbuffer in the context to overwrite the hard coded buffer size in AbstractSaslImpl.java.

No, that will not solve the problem here at all. Active Directory does not support auth-int/-conf on a TLS wrapped connection. It sends a message to notify the client. The client (Java JNDI) does not know this and reads the first bytes assuming the SASL buffer size, but it is a non-wrapped message. I can provide pcaps, if you like.

This NPE is more or less related to the timing. The context was cleared earlier than output stream got flush, but the later code is actually running earlier, but completed later.

Yes, but how will this change solve the problem? Do you intend to add another PR to address it?

@michael-o
Copy link

My "here" was

Not an LDAP expert, but I see that abandonRequest() still wants to write into outStream. If the SASL/GSS context is already disposed by now what stream should this be? Should it be reverted back to the raw stream?

However, I'm not sure if this correct. This means the security guaranteed by the SASL layer is lost and I also don't know if the peer can parse it correctly.

@michael-o What have JDK 8 and ldapsearch done? Did they send error messages in the clear?

JDK 8 is here:
https://mail.openjdk.org/pipermail/security-dev/2025-April/045574.html

Create a TLS connection to Active Directory, bind with SASL GSSAPI and
require SASL integrity. Now, this is thrown by Java 8:

Exception in thread "main" javax.naming.NamingException: LDAP
connection has been closed; remaining name '' at
com.sun.jndi.ldap.LdapRequest.getReplyBer(LdapRequest.java:133) at
com.sun.jndi.ldap.Connection.readReply(Connection.java:469) at
com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:638) at
com.sun.jndi.ldap.LdapClient.search(LdapClient.java:561) at
com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:2014) at
com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java:1873) at
com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1798) at
com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:392)
at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:358)
at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:341)
at
javax.naming.directory.InitialDirContext.search(InitialDirContext.java:267)
at DirTest.main(DirTest.java:24)

ldapsearch:
This is how ldapsearch(1) handles it:

$ ldapsearch -O minssf=1 -d 10 -H ldaps://ad001.siemens.net -s base -Y GSSAPI namingContexts
...
sb_sasl_generic_pkt_length: received illegal packet length of 813957120 bytes
sasl_generic_read: want=16, got=16
0000: 00 7e 02 01 00 78 84 00 00 00 5d 0a 01 02 04 00 .~...x....].....
sb_sasl_cyrus_decode: failed to decode packet: generic failure
sb_sasl_generic_read: failed to decode packet
ldap_read: want=8 error=Input/output error
ber_get_next failed, errno=5.

numResponses: 0

ldap_result: Can't contact LDAP server (-1)
tls_write: want=31, written=31
0000: 15 03 03 00 1a df 9c b5 96 48 55 9d 1e 65 dc eb .........HU..e..
0010: a1 ca 00 a5 96 10 be 5c 23 32 b9 90 68 c4 04 .......#2..h..

libldap does properly signal in the invalid buffer size and shows the
connection closure.

I think I have described the issue quite well in https://mail.openjdk.org/pipermail/security-dev/2025-April/045574.html.

pcap and keylog file are available for inspection. The opposite side does not send an abandonRequest.

@wangweij
Copy link
Contributor

So, it seems we should NOT revert to the raw stream. We can either return earlier in abandonRequest() before the write call or the write should fail (which the current PR does). Of course, an exception with clear information is always better.

@michael-o
Copy link

So, it seems we should NOT revert to the raw stream. We can either return earlier in abandonRequest() before the write call or the write should fail (which the current PR does). Of course, an exception with clear information is always better.

I agree that if the opposite side did close the connection without properly advertising it and we try to send a request and it fails, it should be clearly signalled to the user. CommunicationException or similar which is already used in the code base.

@michael-o
Copy link

Yes. I traced the call. GSSKrb5Base::dispose was called after LDAPClient::close got call and context got closed.

BTW, I will be out of office for two weeks starting next Monday. I will check review comment after coming backing.

Were you able to check it meanwhile?

@weibxiao
Copy link
Contributor Author

Not quite sure what you like me to check. I confirmed the context was disposed. With my testing code, I got same error as what you got. Certainly there is no more NPE.

@michael-o
Copy link

Not quite sure what you like me to check. I confirmed the context was disposed. With my testing code, I got same error as what you got. Certainly there is no more NPE.

Good. I am fine with the PR.

@seanjmullan
Copy link
Member

/label add core-libs

@openjdk
Copy link

openjdk bot commented Sep 30, 2025

@seanjmullan
The core-libs label was successfully added.

@seanjmullan
Copy link
Member

Since the fix is in the java.naming code, someone from the Core Libraries group should also review this PR.

} finally {

flushAndCloseOutputStream();
// 8313657 socket is not closed until GC is run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth keeping a comment here about why sock is not closed, or at least mentioning 8362268?

}
}

private static class CustomSocket extends Socket {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: could you please cleanup unused imports after this change?


private static class CustomSocket extends Socket {
private int closeMethodCalled = 0;
private LdapOutputStream output = new LdapOutputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are local objects which are no longer used after removal of this. I'd personally remove the private static class LdapInputStream extends InputStream { and private static class LdapOutputStream extends OutputStream { further down the file

Comment on lines 484 to 497
private void closeOpenedResource() {
try {
if (conn != null) {
if (conn.outStream != null) {
conn.outStream.close();
}

if (conn.sock != null && !conn.sock.isClosed()) {
conn.sock.close();
}
}
} catch (IOException ioEx) {
//ignore the error;
}
Copy link
Member

@dfuch dfuch Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OK - I missed that the cleanup method had been modified to no longer close the socket]

But another issue is that this method attempts to modify the state of the connection without holding the connection lock. This is not good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility could be to move this code to the connection so that it can participate in the locking.

However - I'm concerned that this proposed fix will reintroduced https://bugs.openjdk.org/browse/JDK-8313657

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed solution needs more explaining, and integrate properly with the connection lock.

}

// flush and close output stream
private void flushAndCloseOutputStream() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method no longer closes the output stream it should no longer be called flushAndCloseOutputStream()

Comment on lines 484 to 497
private void closeOpenedResource() {
try {
if (conn != null) {
if (conn.outStream != null) {
conn.outStream.close();
}

if (conn.sock != null && !conn.sock.isClosed()) {
conn.sock.close();
}
}
} catch (IOException ioEx) {
//ignore the error;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility could be to move this code to the connection so that it can participate in the locking.

However - I'm concerned that this proposed fix will reintroduced https://bugs.openjdk.org/browse/JDK-8313657

@weibxiao
Copy link
Contributor Author

Updated the code to address the review comments.
Add some of my extra analysis as below,

In JDK, at the line

int len = networkByteOrderToInt(lenBuf, 0, 4);
, the buffer length was larger than default value 65536, it would trigger the exception. At the line of 451 of Connnection.java, the exception was caught. Connection::cleanup got called. At line , the method of closeOpenedSocket will close the output stream. For Sasl implementation, this output stream is SaslOutputStream.java. So that SaslOutputStream::close is used to close the stream. Inside this method, GssKrb5Client::dispose was called, its implementation is base class GssKrb5Base::dispose, which is disposing the GSS Context and set it to be null. At mean time, Sasl output stream was till in the middle to handle the buffer by using GssKrb5Base::wrap. Since Gss Context already set to be null, it caused the NPE at line of
byte[] answer = secCtx.wrap(outgoing, start, len, msgProp);
.

JNDI connection was created inside the constructor o LdapClient.java. When Connection object was created, the output stream and input stream was created also. For Sasl implementation, it was SaslnputStream and SaslOutputStream. Both resources were created inside LdapClient when Connection object was created. It seems ok to close them in LdapClient.java. Consider locking the connection, the change was called inside LdapClient::close, which was using ReentrantLock to control the access. Hope to get more info to refine the code here.

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure I understand how this fix works. If all tests are passing then it may be OK. I hope it isn't going to re-introduce a resource leak though. Synchronization/locking must be fixed however, and I have suggseted some changes below that will ensure it integrates correctly with the locking strategy in the Connection class.

It would be good to get @AlekseiEfimov review.

Comment on lines 463 to 464
conn.cleanup(reqCtls, false);
closeOpenedResource();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace these two lines with:

conn.cleanupAndClose(reqCtls);

Then add a method in Connection:

    void cleanupAndClose(Control[] reqCtls) {
        lock.lock();
        try {

            cleanup(reqCtls, false);

            // 8313657 socket is not closed until GC is run
            // it caused the bug 8362268, hence moved here

            if (outStream != null) {
                outStream.close();
            }

            if (!sock.isClosed()) {
                sock.close();
            }
        } catch (IOException ignored) {
            // we're closing, ignore IO.
        } finally {
            lock.unlock();
        }
    }

Comment on lines 511 to 512
conn.cleanup(null, false);
closeOpenedResource();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. replace these two lines with:

conn.cleanupAndClose(null);

}
} catch (IOException ioEx) {
//ignore the error;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change. Use conn.cleanupAnadClose(...) instead.

@weibxiao
Copy link
Contributor Author

Updated the code accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants