Skip to content

StringGet fails with "No connection available" when a master node fails over #1237

@LeeSanderson

Description

@LeeSanderson

I have a local test setup with 3 master nodes and 3 replicas and have configured the ConnectionMultiplexer as follows:

ConnectionMultiplexer.Connect("127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002,127.0.0.1:7100,127.0.0.1:7101,127.0.0.1:7102")

Nodes 7000-7002 are initially the master nodes and nodes 7100-71002 are the replicas.
If I inspect the slot allocations for this configuration using the ClusterConfiguration I get:

Slot range from 0 - 5460 available via 127.0.0.1:7000
Slot range from 5461 - 10922 available via 127.0.0.1:7001
Slot range from 10923 - 16383 available via 127.0.0.1:7002

In this configuration StringGet works fine and selects the node with the slot allocated to it.

If I then kill one of the master nodes (say 7000) and wait for the replica to become the new master (say 7102 takes over from 7000) then inspecting the ClusterConfiguration gives:

Slot range from 0 - 5460 available via 127.0.0.1:7102
Slot range from 5461 - 10922 available via 127.0.0.1:7001
Slot range from 10923 - 16383 available via 127.0.0.1:7002

And running "CLUSTER INFO" from the redis-cli reports that the cluster state is "OK".

However, StringGet throws and error:

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: GET e0f6a26a5dc14b72ab8190ae7330a8e6; UnableToConnect on 127.0.0.1:7000/Interactive, Initializing/NotStarted, last: NONE, origin: BeginConnectAsync, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, global: 0s ago, v: 2.0.601.3402; IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=0,Free=2047,Min=12,Max=2047), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: UnableToConnect on 127.0.0.1:7000/Interactive, Initializing/NotStarted, last: NONE, origin: BeginConnectAsync, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, global: 0s ago, v: 2.0.601.3402

It looks like it is trying to read the value from the old master (7000) even though the replica (7102) has take over responsibility for those slots.

Is this a bug? Have I made some error in my configuration? Is there a work around?

Note also that restarting the failed node stops the error from happening - even though the replica is still the master node.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions