-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fixing a bug where READONLY is run before AUTH. #1367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…hen reconnecting to Clustered Slaves.
|
This is a great find, thank you. I think this has been a thorn for ages - and seeing this coming from someone with the "First-time contributor" flag is: just amazing. (edit: I just spotted the "microsoft" tag in your profile, so: fair enough, but: I'm still delighted to get a great fix like this!) Just for the paperwork, can I get a confirmation that you're able and willing (and thus doing so) to contribute this code to the project under the project's license? (we need to get a CLA bot!) |
|
btw; once I've validated this, I want to get this into the 2.1 drop that has been sat at preview for a few weeks, and get it shipped |
|
Licensing: Yes, I am able and willing (and thus doing so) to contribute this code to the project under the project's license. Getting it into the 2.1 release sounds good but some people on my team have concerns since it seems like the latest version has some connection issues (probably unrelated to this) and customers might hit these issues by upgrading. |
|
Much appreciated. I hear you on the upgrade issue; we can try and continue work on the stability issue in the background, but; not trivial, and ultimately we don't have dedicated time to investigate this. I'm 75% sure we should just prioritize swapping out the entire network layer in favor of RESPite; the actual RESP part is feature-complete (at least for RESP2) - we mainly just need the "pool" pieces. This would solve a lot of problems in one go. But that's also a lot of moving parts, so: probably a v3. Sigh; managing libraries is tricky! |
|
@mgravell Is it possible to release this fix on top of 2.0.519 (maybe create a 2.0.519.1)? Our production service are impacted by this bug several times per week and we're struggle to keep up our availability SLA. |
|
There only reason for 2.1.* is a build / versioning snafu, not anything
more fundamental; so any fix will be in 2.1.*, but this 2.0.* Vs 2.1.*
doesn't represent anything of significance. Sometimes build go weird. This
fix is high on my priorities. In fact, let me deal with the home chores (in
these unusual days), and then see if I can merge it.
…On Wed, 18 Mar 2020, 06:08 Teddy Zhang, ***@***.***> wrote:
@mgravell <https://github.com/mgravell> Is it possible to release this
fix on top of 2.0.519 (maybe create a 2.0.519.1)? Our production service
are impacted by this bug several times per week and we're struggle to keep
up our availability SLA.
We can't wait for the fix for stability issue in newer build. Looks like
we don't have an ETA on that yet. Publish a build on top of 519 seems the
best option here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1367 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMHP43EGRKNHKJ5TEPLRIBQNTANCNFSM4K4LTSKQ>
.
|
I'm not sure if I fully understand your reply. You seems indicate that you'll apply the fix to 2.1., but that's not what we want right now. My ask is create a branch 2.0.519 from this commit: https://github.com/StackExchange/StackExchange.Redis/tree/v2.0.519 I think this is the best short term solution. With this, we can have enough time to wait for newer 2.1.* to become stable. |
|
@lostindark The connection issue was also fixed in #1374 and will be in the 2.1.x release. It's not trivial for us to go back on Git history due to height differences, but there's also no need here. Expect 2.1.x to be released tomorrow - we're running it in production on Stack Overflow right now. The |
|
@NickCraver Thanks for the update. I'll wait for the new stable release and test it when it's ready. |
|
@NickCraver I saw 2.1.0 disappeared from nuget.org. But no new version comes out yet. What is the plan? |
|
Hey @lostindark sorry we're busy with kids and family as well - bit slow on updates here. We found some race conditions that caused errors in badly handled ways (e.g. an event handler) specifically in the Sentinel code after the 2.1.0 cut. Some in master are fixes (async of sync - e.g. #1402), and I've got more bits to change in #1403. Once we hammer these tests more in CI and dogfood, we'll put another build live that's more stable for users. Now, all of that is specific to connecting with Sentinel, and only when the downstream master is down (we're hammering that case in tests, but it's not the norm). If you're not connecting to Sentinel servers: you're fine but we'll still update here. Another set of changes post- |
|
@NickCraver Thanks for the detail update! |
Fixing a bug where READONLY is run before AUTH when reconnecting to clustered Slaves and since Readonly is run as Fire and Forget we never get an error either.
This causes MOVED errors until the Multiplexer is recreated (easily seen when NoRedirect is specified) or all MOVED errors are handled internally but this means the Slaves are not at all used to distribute load which can cause connection issues with much smaller loads.