Skip to content

Improve logging message for improper connection type #32688

@jpcarey

Description

@jpcarey

Describe the feature: Improve logging message for improper connection type.

Example 1: TLS enabled node connecting to non-TLS enabled node

If transport.ssl is not enabled, and another node tries to join with TLS enabled - you end up with messages like the one below spamming the log every second or so. It would be helpful to parse some of the tcp header (ex. (16,3,3,0), 0x16 is the first byte of a TLS Client Hello), and provide a concise message.

[2018-08-07T13:11:38,889][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [4J0Fy6n] exception caught on transport layer [NettyTcpChannel{localAddress=/0:0:0:0:0:0:0:1:9301, remoteAddress=/0:0:0:0:0:0:0:1:56406}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (16,3,3,0)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167) [netty-handler-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1354) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:917) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (16,3,1,1)
	at org.elasticsearch.transport.TcpTransport.validateMessageHeader(TcpTransport.java:1283) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:36) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]
	... 20 more

It could be condensed into something like this:

[2018-08-07T13:11:38,889][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [4J0Fy6n] exception caught on transport layer [NettyTcpChannel{localAddress=/0:0:0:0:0:0:0:1:9301, remoteAddress=/0:0:0:0:0:0:0:1:56406}], closing connection: SSL/TLS request received but SSL/TLS is not enabled on this node

Example 2: SSL/TLS request to wrong port

This a curl request using https to the wrong port (9300 instead of 9200). This failed because the TLS connection attempt did not provide client certs (null cert chain). In this scenario, there is not much to improve - beyond providing a more concise message about what happened, rather than a full stack trace with a vague javax.net.ssl.SSLHandshakeException: null cert chain.

[2018-08-07T13:02:34,247][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [node01] exception caught on transport layer [NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/0:0:0:0:0:0:0:1:56158}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: null cert chain
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: javax.net.ssl.SSLHandshakeException: null cert chain
	at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1529) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[?:?]
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_172]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:281) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1215) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1127) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]
	... 15 more
Caused by: javax.net.ssl.SSLHandshakeException: null cert chain
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[?:?]
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:330) ~[?:?]
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:318) ~[?:?]
	at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1935) ~[?:?]
	at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:237) ~[?:?]
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) ~[?:?]
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:992) ~[?:?]
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:989) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_172]
	at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467) ~[?:?]
	at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1364) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1272) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1127) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]
	... 15 more

Example 3: Non SSL/TLS request sent to TLS encrypted endpoint

The message is clear, but should be a single line log message - not a full stack trace.

[2018-08-07T13:07:53,952][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [node01] exception caught on transport layer [NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/127.0.0.1:56274}], closing connection
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 455300000027000000000000000508004d3603000016696e7465726e616c3a7463702f68616e647368616b6500
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 455300000027000000000000000508004d3603000016696e7465726e616c3a7463702f68616e647368616b6500
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1106) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]
	... 15 more

There are likely other scenarios that could be improved, I just wanted to open this for discussion. These large stack traces make it rather hard to troubleshoot when tailing the log file looks like something out of the Matrix movie. If you were to grep for only lines starting with a square bracket ([2018-08-07T13:32:27,481] the square bracket in front of the date indicating the start of a new log message), you would only end up with the knowledge of closing connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/NetworkHttp and internode communication implementations>tech debtSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions