Skip to content

NPE when upgrading from 5.2 to 5.6 with index.routing.allocation.exclude.tag: null #28213

@dadoonet

Description

@dadoonet

Elasticsearch version (bin/elasticsearch --version): 5.6.5

Description of the problem including expected versus actual behavior:

From this discussion: https://discuss.elastic.co/t/cant-upgrade-elasticsearch-5-2-1-to-5-6-5/115403

When a user set on an index a weird setting like: "index.routing.allocation.exclude.tag": null (was done in 5.2), when trying to restart the cluster, the master node is sending NPE:

[2018-01-14T17:36:24,392][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
	at org.elasticsearch.cluster.node.DiscoveryNodeFilters.buildFromKeyValue(DiscoveryNodeFilters.java:73) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.IndexMetaData$Builder.build(IndexMetaData.java:1044) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.IndexMetaData.readFrom(IndexMetaData.java:724) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.MetaData.readFrom(MetaData.java:676) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:659) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.discovery.zen.MembershipAction$ValidateJoinRequest.readFrom(MembershipAction.java:171) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1510) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1396) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_73]

As this can happen, I think we should try to be a bit safer when exclude or include values are null.

The code which is failing: https://github.com/elastic/elasticsearch/blob/5.6/core/src/main/java/org/elasticsearch/cluster/node/DiscoveryNodeFilters.java#L69-L81

String[] values = Strings.tokenizeToStringArray(entry.getValue(), ",");
if (values.length > 0) {
  // ...

values is null in that case.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions