-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
This issue [1][2] only impacts 6.x cluster that have versions that are pre 6.6.0 and post 6.6.0 versions while a document is updated using a partial update. This issue only happens when the newer cluster (6.6.0+) holds the primary shard and tries to replicate to the older cluster (6.6.0-).
The error stems from this check. This issue could also impact explicit update request using sequence number and primary terms in a mixed cluster with older nodes, however, this check is to explicitly prevent users from doing just that.
I believe that partial document updates get caught up in this logic since they are implemented as index operations with merges and the original document has the sequence number and primary term and those get included as part of the index request. When that index request has a primary on a new (6.6.0+) node and it tries to replicate to an old node this issue occurs.
This was discovered while troubleshooting a BWC test failure for watcher that does partial document updates while performing a rolling restart. I was able to reproduce this behavior outside of Watcher and will include repro steps in a comment.
I am not terribly familiar the specifics of replication, but it seems odd that ES allows replication from a new node to an old in a mixed cluster. In trying to reproduce this, I was only able to reproduce by allowing the shard to become the primary via replica promotion. For example, using shard allocation filtering and introduction of replicas in a mixed cluster, I was not able to introduce a scenerio where the primary was "new" and the replica was "old" (it (correctly) refused to assign the replica to an old node when the new node was primary). Only when I allowed replication promotion from replica to primary was I able to reproduce a scenerio where "new" tried to replica to "old" in a mixed cluster.
I wonder if the real issue here is the potential for a "new" node to be promoted to the primary in a mixed cluster where it will result in attempting to replicate to an "old" node ?
[1] Note - I did not explicitly set sequence numbers or primary terms here, only an partial document update request
"failures" : [
{
"_index" : "foo",
"_shard" : 0,
"_node" : "OSGt32FRRlm7KARfCER45w",
"reason" : {
"type" : "illegal_state_exception",
"reason" : "sequence number based compare and write is not supported until all nodes are on version 7.0 or higher. Stream version [6.5.4]"
},
"status" : "INTERNAL_SERVER_ERROR",
"primary" : false
}
]
[2]
java.lang.IllegalStateException: sequence number based compare and write is not supported until all nodes are on version 7.0 or higher. Stream version [6.5.4]
at org.elasticsearch.action.index.IndexRequest.writeTo(IndexRequest.java:660) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.DocWriteRequest.writeDocumentRequest(DocWriteRequest.java:244) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.bulk.BulkItemRequest.writeTo(BulkItemRequest.java:110) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.bulk.BulkShardRequest.writeTo(BulkShardRequest.java:73) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteShardRequest.writeTo(TransportReplicationAction.java:1276) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteReplicaRequest.writeTo(TransportReplicationAction.java:1332)
:6.7.1]
at org.elasticsearch.transport.OutboundMessage.writeMessage(OutboundMessage.java:70) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundMessage.serialize(OutboundMessage.java:53) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:107) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:93) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$SendContext.get(OutboundHandler.java:140) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler.internalSendMessage(OutboundHandler.java:78) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:70) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:690) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:679) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.access$300(TcpTransport.java:100) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:272) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:626) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$1.sendRequest(SecurityServerTransportInterceptor.java:136) [x-pack-security-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:541) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:529) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction.sendReplicaRequest(TransportReplicationAction.java:1213) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1175) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:166) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:153) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:127) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:424) [
.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:370) [
.7.1]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:273) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:240) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2561) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:987) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:369) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:324) [
.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:311) [
.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) [
:6.7.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) [
:6.7.1]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:686) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-05-26T17:12:16,214][WARN ][o.e.a.b.TransportShardBulkAction] [node_one] [[foo][0]] failed to perform indices:data/write/bulk[s] on replica [foo][0], node[
[R], s[STARTED], a[id=Z9njA7ZATaea1pxYEUl8BA]
org.elasticsearch.transport.SendRequestTransportException: [node_three][192.168.0.3:9302][indices:data/write/bulk[s][r]]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:638) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$1.sendRequest(SecurityServerTransportInterceptor.java:136) ~[?:?]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:541) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:529) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction.sendReplicaRequest(TransportReplicationAction.java:1213) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1175) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:166) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:153) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:127) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:424)
:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:370)
:6.7.1]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:273) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:240) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2561) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:987) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:369) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:324)
:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:311)
:6.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:686) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.lang.IllegalStateException: sequence number based compare and write is not supported until all nodes are on version 7.0 or higher. Stream version [6.5.4]
at org.elasticsearch.action.index.IndexRequest.writeTo(IndexRequest.java:660) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.DocWriteRequest.writeDocumentRequest(DocWriteRequest.java:244) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.bulk.BulkItemRequest.writeTo(BulkItemRequest.java:110) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.bulk.BulkShardRequest.writeTo(BulkShardRequest.java:73) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteShardRequest.writeTo(TransportReplicationAction.java:1276) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteReplicaRequest.writeTo(TransportReplicationAction.java:1332)
:6.7.1]
at org.elasticsearch.transport.OutboundMessage.writeMessage(OutboundMessage.java:70) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundMessage.serialize(OutboundMessage.java:53) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:107) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:93) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler$SendContext.get(OutboundHandler.java:140) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler.internalSendMessage(OutboundHandler.java:78) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:70) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:690) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:679) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport.access$300(TcpTransport.java:100) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:272) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:626) ~[elasticsearch-6.7.1.jar:6.7.1]
... 29 more