Skip to content

Operations Never Being Completed or Timed Out #1242

@DeRauk

Description

@DeRauk

We upgraded from version 1.2.6 to version 2.0.601 a couple weeks ago and noticed weird behavior popping up on a handful of servers (not all). On those servers there were 75+ operations that had been waiting for a response for hours. We have the sync and async timeout settings set to 500 ms so those operations should have timed out.

We took a memory dump to do further analysis and found multiple StackExchange.Redis.Message+CommandKeyMessage objects with these properties:

Type VT Attr Value Name
System.Int32 1 instance 0 Db
System.Int32 1 instance 88 command
....Redis.IResultBox 0 instance 0000020828fbba78 resultBox
...s.ResultProcessor 0 instance 000001fda86515b8 resultProcessor
...g.ProfiledCommand 0 instance 0000000000000000 performance
System.DateTime 1 instance 09/17/2019 10:50:12 PM createdDateTime
System.Int64 1 instance 63630327330 createdTimestamp
System.Int32 1 instance 1216 Flags k__BackingField
System.Int32 1 instance 2 Status k__BackingField
...hysicalConnection 0 instance 0000000000000000 _enqueuedTo
System.Int64 1 instance -1 _queuedStampReceived
System.Int64 1 instance -1 _queuedStampSent
System.Int32 1 instance 27211609 _writeTickCount
...ge.Redis.RedisKey 1 instance 0000020828fbba60 Key

A couple things stood out from that:

  • The createdDateTime is set to 09/17/2019 10:50:12 PM . This memory dump was taken on 09/18/2019 at 09:20:27 pm. So the operation was created almost 24 hours earlier and hadn't completed yet.
  • _queuedStampReceived and _queuedStampSent are set to -1. From looking at the code this can only happen if the PhysicalConnection is null when the message is enqueued. In this case the message would have also been added to the PhysicalBridge _backlog Queue.
  • The Flags field contains 64 (NoRedirectFlag), 128 (InternalCallFlag), and 1024 (NeedsAsyncTimeoutCheckFlag).

Here's a dump of a PhysicalBridgeObject in a weird state too:

Type VT Attr Value Name
System.String 0 instance 00000203e8666ff0 Name
System.Int64[] 0 instance 00000203e8666f28 profileLog
...kExchange.Redis]] 0 instance {_size:82; ...} _backlog
System.Int32 1 instance 0 activeWriters
System.Int32 1 instance 0 beating
System.Int32 1 instance 1317 failConnectCount
System.Boolean 1 instance 0 isDisposed
System.Int64 1 instance 0 nonPreferredEndpointCount
System.Int64 1 instance 867900 operationCount
System.Int64 1 instance 1318 socketCount
...hysicalConnection 0 instance 00000200ebbb4ef0 physical
System.Int64 1 instance 867900 profileLastLog
System.Int32 1 instance 105466 profileLogIndex
System.Boolean 1 instance 0 reportNextFailure
System.Boolean 1 instance 1 reconfigureNextFailure
System.Int32 1 instance 1 state
System.WeakReference 0 instance 00000203e86670c0 _weakRefThis
System.Int32 1 instance 500 TimeoutMilliseconds
System.Exception 0 instance RedisConnectionException LastException k__BackingField
System.Int32 1 instance 1 ConnectionType k__BackingField
...ectionMultiplexer 0 instance 0000020868f19640 Multiplexer k__BackingField
...is.ServerEndPoint 0 instance 00000203e8666d20 ServerEndPoint k__BackingField
System.Int32 1 instance 37 LastCommand k__BackingField
...kExchange.Redis]] 0 instance 0000000000000000 _subscriptionBackgroundQueue
System.Int32 1 instance 108183265 connectStartTicks
System.Int64 1 instance 0 connectTimeoutRetryCount
...reading.MutexSlim 0 instance 00000203e8667048 _singleWriterMutex
...nge.Redis.Message 0 instance 00000207a8e0d0f0 _activeMesssage
System.Byte 1 instance 0 _backlogStatus
...nge.Redis.Message 0 static 0000020168628ef0 ReusableAskingCommand
...dedChannelOptions 0 static 0000020168628f58 s_subscriptionQueueOptions
...bject, mscorlib]] 0 static 0000020168628f70 s_ProcessBacklog

Interesting pieces from this:

  • The PhysicalBridge is in a ConnectedEstablishing state.
  • The backlog has 82 items in it. I randomly looked at 10 of them and they were all from 09/17, so they'd been in the queue for almost 24 hours.
  • The last exception was a RedisConnectionException with message: "SocketFailure on {ip:port}/Interactive, Flushing/Faulted, last: GET, origin: ReadFromPipe, outstanding: 534, last-read: 19s ago, last-write: 0s ago, unanswered-write: 18s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402
    "

Has anyone else seen something like this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions