Skip to content

StreamingUpdateProcessor reconnect seems to hang when hitting stream_read_timeout #209

@pballandras

Description

@pballandras

Is this a support request?
🙅

Describe the bug
During the outage of May 18th 2023, we saw a number of calls simply hanging. The code using the SDK would simply stop and nothing seemed to be going on. I dug into the code briefly, but couldn't figure out entirely what caused our calls to simply hang. We call in our code client.variation(...) (where client: LDClient) and that works well during non-outage periods. Of course, during outages, stuff goes wrong, but I'm not sure if it's this call that hangs.

If you look at the logs down below, you'll see that the logs (that end there, it's not cut) end at exactly 5min from the time the StreamingUpdateProcessor initialized. In the code, the timeout is set at 5min as well.

So, to describe the bug, it would seem that there might exist an infinite loop around the logic for the connection (or reconnection) to stream.launchdarkly.com.

Edit: Here's where I was in the code:

  • ldclient/impl/datasource/streaming.py at line 94
  • ldclient/impl/events/event_processor.py at line 473
  • ldclient/impl/util.py at line 114

To reproduce
Ok so guys

  1. create an outage at LD :trollface:

No but for real, since the circumstances are kind of extreme, I'm not sure how to reproduce it. My guess is that by timing out the heartbeat on the StreamingUpdateProcessor you can reproduce this, but then again, I can't be sure... Sorry 🙏

Expected behavior
No hanging. Throwing an error would be absolutely fine in that case.

Logs

2023-05-18 11:35:56 INFO     Starting event processor
2023-05-18 11:35:56 INFO     Starting StreamingUpdateProcessor connecting to uri: https://stream.launchdarkly.com/all
2023-05-18 11:35:56 INFO     Waiting up to 5 seconds for LaunchDarkly client to initialize...
2023-05-18 11:35:56 INFO     StreamingUpdateProcessor initialized ok.
2023-05-18 11:35:56 INFO     Started LaunchDarkly Client: OK
[... time goes fast ...]
2023-05-18 11:40:56 WARNING  Unexpected error on stream connection: HTTPSConnectionPool(host='stream.launchdarkly.com', port=443): Read timed out., will retry
2023-05-18 11:40:56 INFO     Will reconnect after delay of 0.704595s
2023-05-18 11:51:11 WARNING  Error posting diagnostic event (will retry): HTTPSConnectionPool(host='events.launchdarkly.com', port=443): Read timed out. (read timeout=15)

SDK version
6.13.3

I thought we were on a more recent version, but looking at the changelog, I didn't find any explicit fix for this, but I may be wrong.

Language version, developer tools
Python 3.9

OS/platform
Ubuntu 18.04

Additional context
If I omitted anything, don't hesitate to ask me, I've been on the other side of support cases so I know users tend to forget the details.

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions