-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Description
We upgraded from version 1.2.6 to version 2.0.601 a couple weeks ago and noticed weird behavior popping up on a handful of servers (not all). On those servers there were 75+ operations that had been waiting for a response for hours. We have the sync and async timeout settings set to 500 ms so those operations should have timed out.
We took a memory dump to do further analysis and found multiple StackExchange.Redis.Message+CommandKeyMessage objects with these properties:
| Type | VT | Attr | Value | Name |
|---|---|---|---|---|
| System.Int32 | 1 | instance | 0 | Db |
| System.Int32 | 1 | instance | 88 | command |
| ....Redis.IResultBox | 0 | instance | 0000020828fbba78 | resultBox |
| ...s.ResultProcessor | 0 | instance | 000001fda86515b8 | resultProcessor |
| ...g.ProfiledCommand | 0 | instance | 0000000000000000 | performance |
| System.DateTime | 1 | instance | 09/17/2019 10:50:12 PM | createdDateTime |
| System.Int64 | 1 | instance | 63630327330 | createdTimestamp |
| System.Int32 | 1 | instance | 1216 | Flags k__BackingField |
| System.Int32 | 1 | instance | 2 | Status k__BackingField |
| ...hysicalConnection | 0 | instance | 0000000000000000 | _enqueuedTo |
| System.Int64 | 1 | instance | -1 | _queuedStampReceived |
| System.Int64 | 1 | instance | -1 | _queuedStampSent |
| System.Int32 | 1 | instance | 27211609 | _writeTickCount |
| ...ge.Redis.RedisKey | 1 | instance | 0000020828fbba60 | Key |
A couple things stood out from that:
- The createdDateTime is set to 09/17/2019 10:50:12 PM . This memory dump was taken on 09/18/2019 at 09:20:27 pm. So the operation was created almost 24 hours earlier and hadn't completed yet.
- _queuedStampReceived and _queuedStampSent are set to -1. From looking at the code this can only happen if the PhysicalConnection is null when the message is enqueued. In this case the message would have also been added to the PhysicalBridge _backlog Queue.
- The Flags field contains 64 (NoRedirectFlag), 128 (InternalCallFlag), and 1024 (NeedsAsyncTimeoutCheckFlag).
Here's a dump of a PhysicalBridgeObject in a weird state too:
| Type | VT | Attr | Value | Name |
|---|---|---|---|---|
| System.String | 0 | instance | 00000203e8666ff0 | Name |
| System.Int64[] | 0 | instance | 00000203e8666f28 | profileLog |
| ...kExchange.Redis]] | 0 | instance | {_size:82; ...} | _backlog |
| System.Int32 | 1 | instance | 0 | activeWriters |
| System.Int32 | 1 | instance | 0 | beating |
| System.Int32 | 1 | instance | 1317 | failConnectCount |
| System.Boolean | 1 | instance | 0 | isDisposed |
| System.Int64 | 1 | instance | 0 | nonPreferredEndpointCount |
| System.Int64 | 1 | instance | 867900 | operationCount |
| System.Int64 | 1 | instance | 1318 | socketCount |
| ...hysicalConnection | 0 | instance | 00000200ebbb4ef0 | physical |
| System.Int64 | 1 | instance | 867900 | profileLastLog |
| System.Int32 | 1 | instance | 105466 | profileLogIndex |
| System.Boolean | 1 | instance | 0 | reportNextFailure |
| System.Boolean | 1 | instance | 1 | reconfigureNextFailure |
| System.Int32 | 1 | instance | 1 | state |
| System.WeakReference | 0 | instance | 00000203e86670c0 | _weakRefThis |
| System.Int32 | 1 | instance | 500 | TimeoutMilliseconds |
| System.Exception | 0 | instance | RedisConnectionException | LastException k__BackingField |
| System.Int32 | 1 | instance | 1 | ConnectionType k__BackingField |
| ...ectionMultiplexer | 0 | instance | 0000020868f19640 | Multiplexer k__BackingField |
| ...is.ServerEndPoint | 0 | instance | 00000203e8666d20 | ServerEndPoint k__BackingField |
| System.Int32 | 1 | instance | 37 | LastCommand k__BackingField |
| ...kExchange.Redis]] | 0 | instance | 0000000000000000 | _subscriptionBackgroundQueue |
| System.Int32 | 1 | instance | 108183265 | connectStartTicks |
| System.Int64 | 1 | instance | 0 | connectTimeoutRetryCount |
| ...reading.MutexSlim | 0 | instance | 00000203e8667048 | _singleWriterMutex |
| ...nge.Redis.Message | 0 | instance | 00000207a8e0d0f0 | _activeMesssage |
| System.Byte | 1 | instance | 0 | _backlogStatus |
| ...nge.Redis.Message | 0 | static | 0000020168628ef0 | ReusableAskingCommand |
| ...dedChannelOptions | 0 | static | 0000020168628f58 | s_subscriptionQueueOptions |
| ...bject, mscorlib]] | 0 | static | 0000020168628f70 | s_ProcessBacklog |
Interesting pieces from this:
- The PhysicalBridge is in a ConnectedEstablishing state.
- The backlog has 82 items in it. I randomly looked at 10 of them and they were all from 09/17, so they'd been in the queue for almost 24 hours.
- The last exception was a RedisConnectionException with message: "SocketFailure on {ip:port}/Interactive, Flushing/Faulted, last: GET, origin: ReadFromPipe, outstanding: 534, last-read: 19s ago, last-write: 0s ago, unanswered-write: 18s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402
"
Has anyone else seen something like this?