Skip to content

Add retry logic of connection lifetime to cluster client and sentinel client #833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 11, 2025

Conversation

terut
Copy link
Contributor

@terut terut commented Apr 23, 2025

This is follow-up PR regarding #727 .

See #727 (comment).

@rueian
Copy link
Collaborator

rueian commented Apr 23, 2025

Wow. You are lightning fast, but we still need some test cases for this.

@terut
Copy link
Contributor Author

terut commented Apr 23, 2025

@rueian Sure, I'll add the tests. I’m not familiar to cluster and sentinel client, so I wanted to check whether my understanding is correct about what you said.

@rueian
Copy link
Collaborator

rueian commented Apr 23, 2025

Yes, I think you are on the right track.

@rueian rueian force-pushed the main branch 2 times, most recently from db71c0d to 34ba5cf Compare April 28, 2025 17:51
@terut terut force-pushed the feat/conn-lifetime-cluster-sentinel branch 2 times, most recently from e2204df to 232c933 Compare April 30, 2025 07:08
@terut terut force-pushed the feat/conn-lifetime-cluster-sentinel branch from 232c933 to 3c7632f Compare April 30, 2025 12:58
@terut terut marked this pull request as ready for review May 1, 2025 07:55
@terut terut force-pushed the feat/conn-lifetime-cluster-sentinel branch from 2238fd7 to 4f57123 Compare May 1, 2025 11:43
cluster.go Outdated
ncc := c.redirectOrNew(addr, cc, cmd.Slot(), mode)
resp = ncc.Do(ctx, cmd)
if resp.Error() == errConnExpired {
goto retry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go back to L526?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 3623adc . Is it correct?

cluster.go Outdated
goto process
case RedirectAsk:
results := c.redirectOrNew(addr, cc, cmd.Slot(), mode).DoMulti(ctx, cmds.AskingCmd, cmd)
ncc := c.redirectOrNew(addr, cc, cmd.Slot(), mode)
results := c.doMulti(ctx, ncc, cmds.AskingCmd, cmd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASKING can't be handled this way. It must be sent in every retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rueian Does it mean that ASKING needs at head of cmds when retrying due to errConnExpired

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ASKING is only effective in the same connection and applied only to the next command; therefore, we need to re-send ASKING before the target cmd if errConnExpired happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will also fix askingMulti and askingMultiCache as same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terut terut requested a review from rueian May 2, 2025 12:32
cluster.go Outdated
if i > 0 && commands[i-1] == cmds.AskingCmd {
ml = commands[i-1:]
} else {
ml = commands[i:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me that we need to handle the MULTI EXEC pattern correctly: If errConnExpired occurs during the MULTI EXEC, we should retry the entire MULTI EXEC block. This applies to the single client as well, which we overlooked in the previous PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rueian Would you check this pull-request #837 ? This is draft so I will fix other methods and add tests after checking the implement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the draft looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rueian Please review #837 .

Copy link
Contributor Author

@terut terut May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed response of Multi command of test data.

I'm writing tests for askingMulti() but I'm not sure why multi has the following commands when calling that method. Any idea?

// Call for unit testing
multi := []Completed{
	client.B().Get().Key("1{t}").Build(),
	client.B().Multi().Build(),
	client.B().Incr().Key("2{t}").Build(),
	client.B().Incr().Key("3{t}").Build(),
	client.B().Exec().Build(),
	client.B().Get().Key("4{t}").Build(),
}
resps := client.DoMulti(context.Background(), multi...)

// multi in askingMulti(). I don't know why there is no [GET 2{t}] [Get 3{t}].
[ASKING]
[GET 1{t}]
[ASKING]
[MULTI]
[EXEC]
[ASKING]
[GET 4{t}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 6362824

cluster.go Outdated
var ml []Completed
recover:
ml = ml[:0]
for i, resp := range resps.s {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we can advance the i by 6 in every iteration and only check the Error of the ExecCmd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 6362824

@terut terut requested a review from rueian May 10, 2025 07:29
@rueian rueian merged commit 0dc2b40 into redis:main May 11, 2025
34 checks passed
@rueian
Copy link
Collaborator

rueian commented May 11, 2025

Thanks @terut!

@terut terut deleted the feat/conn-lifetime-cluster-sentinel branch May 12, 2025 03:00
@terut
Copy link
Contributor Author

terut commented May 12, 2025

@rueian Thank you for great help for couple of months 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants