-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I have a local test setup with 3 master nodes and 3 replicas and have configured the ConnectionMultiplexer as follows:
ConnectionMultiplexer.Connect("127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002,127.0.0.1:7100,127.0.0.1:7101,127.0.0.1:7102")
Nodes 7000-7002 are initially the master nodes and nodes 7100-71002 are the replicas.
If I inspect the slot allocations for this configuration using the ClusterConfiguration I get:
Slot range from 0 - 5460 available via 127.0.0.1:7000
Slot range from 5461 - 10922 available via 127.0.0.1:7001
Slot range from 10923 - 16383 available via 127.0.0.1:7002
In this configuration StringGet works fine and selects the node with the slot allocated to it.
If I then kill one of the master nodes (say 7000) and wait for the replica to become the new master (say 7102 takes over from 7000) then inspecting the ClusterConfiguration gives:
Slot range from 0 - 5460 available via 127.0.0.1:7102
Slot range from 5461 - 10922 available via 127.0.0.1:7001
Slot range from 10923 - 16383 available via 127.0.0.1:7002
And running "CLUSTER INFO" from the redis-cli reports that the cluster state is "OK".
However, StringGet throws and error:
StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: GET e0f6a26a5dc14b72ab8190ae7330a8e6; UnableToConnect on 127.0.0.1:7000/Interactive, Initializing/NotStarted, last: NONE, origin: BeginConnectAsync, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, global: 0s ago, v: 2.0.601.3402; IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=0,Free=2047,Min=12,Max=2047), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: UnableToConnect on 127.0.0.1:7000/Interactive, Initializing/NotStarted, last: NONE, origin: BeginConnectAsync, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, global: 0s ago, v: 2.0.601.3402
It looks like it is trying to read the value from the old master (7000) even though the replica (7102) has take over responsibility for those slots.
Is this a bug? Have I made some error in my configuration? Is there a work around?
Note also that restarting the failed node stops the error from happening - even though the replica is still the master node.