Sync with latest faye-redis-node #27

Honza-Kubat · 2024-12-09T10:18:48Z

No description provided.

If there is a timeout set check that an existing clientId's timestamp is newer than double that timeout. If no timeout is set just check for the clientId's existence. This mitigates a race condition that is triggered when a connect and/or subscribe occurs for a given clientId that is in the midst of being reaped by the garbage collector. In this case orphaned messages queues and channel subscriptions could remain after a clientId was reaped.

Our Redis backend is becoming CPU bound so we need to shard the keys Faye uses over multiple Redis systems to reduce the amount of work each system needs to do. This change introduces a lightweight wrapper to abstract using a distributed set of redis servers. The Engine now accepts an array of Redis URL strings in the servers option key. Note that since pub/sub isn't shardable, the first server provided will be used as the bus. The only significant change made to the existing engine is the call to multi. The signature for multi has been changed to accept a key so that it can return a specific connection. This implies that multi can only operate on one key only per call and leveraged for atomicity. That said nothing enforces this: multi returns a connection on which anything can be called. Utilitor emptor.

SUNION isn't shardable because the sets are likely distributed amongst multiple Redis servers. Instead, we'll have to do more work to collect the ClientIDs and ensure we only deliver to each once.

To reduce Redis chatter, we don't support wildcard channel delivery. This is not something we use in our Bayeux implementation as these subscriptions are not permitted.

We only need the connection items while connecting so instead of keeping them around, we'll just set an ivar for the original urls and use them as our index values.

It's more generic and it'll make more sense if we set it for other processes (the dedicated GC process, e.g.). Also, some docs and a version bump.

If the set of clients is really large, the ZRANGEBYSCORE Redis command can overwhelm the process by taking too much time and too much memory. Setting the `gc_limit` option will now set the LIMIT parameter on that command, allowing the GC cycle to finish within a reasonable amount of time. Also adds some logging.

Small victories.

Publishing was stepping on GC's toes by calling `destroyClient` whenever it detected a client ID was past its prime. I *think* there may have been some contention between the two (publisher and gc), so this moves the cleanup responsibility wholly to the latter. Well, not completely. The `disconnect` function inside of Faye itself calls `destroyClient` as well, but let's leave that be.

A lot of changes here. Primarily, the destroyClient function now tolerates Redis failures all the way downstream, and if any error is detected, the destroy process stops. Instead of trying to recover, we rely on the hope that GC will pick up a client and re-run the calls in the future, since the removal from the /clients sorted set is the final Redis call. The GC lock has also been removed. After grabbing a list of expired client IDs, we iterate over them with destroyClient, which returns immediately. Since we're simply launching destroyClient into flight, and don't care if it succeeds or not, we allow GC to re-run in the future without any restrictions. Because destroyClient doesn't block, we have an additional, expected test failure -- "Redis engine destroyClient when the client has subscriptions stops the client receiving messages:". This will fail b/c it's written to expect the destroy to occur immediately. If the test is rewritten with the expectation in a callback, it succeeds.

Previously, it would set the initial score to zero, and then allow ping() to update the score to now in the callback. This seems like it would allow a window of opportunity for GC to sneak in and kill the newly created client, however, so now it simply sets the score in the initial call.

Rewrite GC to handle Redis failures

Just fails rather than trying again.

This is just burning a Redis call that we don't need to make.

Let's be optimistic about cleaning up the channels and messages for a client (the latter expires anyway).

It turns out that having publishers GC is pretty important. That mechanism was largely responsible for keeping memory use down on the backend Redis servers, and turning it off proved a catastrophe.

Since we want to move to a serial GC, the callback is now always invoked, but with a "false" argument if the GC didn't complete for whatever reason.

This replaces the batch interval-based GC with a serial process. Now, clients are processed one-at-a-time, and we depend heavily on the callbacks (and errbacks) to re-invoke the GC loop. An experiment, really.

Since we're hinging into the `destroyClient` function, we should get stats from any source that destroys clients, including the GC process, publishers, and front-ends.

jpignata and others added 30 commits October 4, 2012 16:46

Set score to 0 before destroying clientId

10c8789

Reintroduce buffer

bd7fca9

Reset client to 0 before GC

1c534bf

Clean up dead clients in publish

cb4e977

Expire message queues after an hour

e41303b

Don't use SUNION

591812e

SUNION isn't shardable because the sets are likely distributed amongst multiple Redis servers. Instead, we'll have to do more work to collect the ClientIDs and ensure we only deliver to each once.

Don't support wildcard delivery

7bc7221

To reduce Redis chatter, we don't support wildcard channel delivery. This is not something we use in our Bayeux implementation as these subscriptions are not permitted.

Rename host to hostname in options object

3432800

Try sharding the pub/sub bus

17fbbec

Update documentation

20ceedd

Ditch servers object

f904c6a

We only need the connection items while connecting so instead of keeping them around, we'll just set an ivar for the original urls and use them as our index values.

Minor consistency changes

3ef6b35

Disable GC if the gc engine option is false

451c1f0

Add some debugging around possible Redis failures

844d0fa

Split this._ns + /clients sorted set across redis instances

42a6af7

Bump version

e6e51d1

Fix forEach in gc

7aa6d27

gc: Fix erroneous lock manipulation

046e3cb

Add an option to squelch subscriptions for publishers

7faabaa

Bump version

c509d4f

Rename the publisher option to disable_subscriptions

3d219ac

It's more generic and it'll make more sense if we set it for other processes (the dedicated GC process, e.g.). Also, some docs and a version bump.

Fix failing test with undefined clientId

b6e3abf

Small victories.

Remove useless connection logging

c16ee98

Convert console.log to Faye debug in GC

c9a906f

daveyeu added 22 commits October 17, 2014 08:37

Modify Faye logging level when running tests

ebd7f3e

Update README with test running instructions

67c2cfb

Update version

f558fe6

Fixed namespace when removing subscriptions

f771ae5

Merge pull request #1 from groupme/redis-errors

2d43d61

Rewrite GC to handle Redis failures

Add logging for number of GC'd clients per run

a6c9b19

Handle Redis error when publishing

f59f7ed

Just fails rather than trying again.

Update version

946c4f7

Merge branch 'bug-fixes'

a4d8829

Don't reset the score during GC

ec3b5cb

This is just burning a Redis call that we don't need to make.

Pipeline calls to cleanup GC'd clients

fa67bb4

Let's be optimistic about cleaning up the channels and messages for a client (the latter expires anyway).

Merge branch 'optimize-gc'

1fbd38a

Re-enable GC from publishers

1071395

It turns out that having publishers GC is pretty important. That mechanism was largely responsible for keeping memory use down on the backend Redis servers, and turning it off proved a catastrophe.

Always call destroyClient's callback

f0c6ec0

Since we want to move to a serial GC, the callback is now always invoked, but with a "false" argument if the GC didn't complete for whatever reason.

Rewrite GC to process continuously

3c7ed0d

This replaces the batch interval-based GC with a serial process. Now, clients are processed one-at-a-time, and we depend heavily on the callbacks (and errbacks) to re-invoke the GC loop. An experiment, really.

Bump version

5a6a685

Merge branch 'publisher-gc'

ee898cb

Add node-statsd dependency for GC stats

7e58a5b

Increments statsd for GC successes/failures

fc07e69

Since we're hinging into the `destroyClient` function, we should get stats from any source that destroys clients, including the GC process, publishers, and front-ends.

Setup gauges to track the number of clients in each Redis shard

1ca1dd1

Bump version

631ec9c

Merge branch 'statsd'

6592d04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with latest faye-redis-node #27

Sync with latest faye-redis-node #27

Honza-Kubat commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sync with latest faye-redis-node #27

Are you sure you want to change the base?

Sync with latest faye-redis-node #27

Conversation

Honza-Kubat commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants