-
Notifications
You must be signed in to change notification settings - Fork 839
Adding Test for getShardedRules call #4449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bb76c5a to
10ffb4b
Compare
|
I guess
|
b80ca34 to
a90e35c
Compare
|
The next change will be something like this: alanprot@f23f10f This is the issue i wanted to fix. |
No - CHANGELOG is for a Cortex admin to see what affects them in the new version. |
bboreham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly good; I made some comments.
The test is now really long, so if it can be broken down to make it easier to follow that would be good.
pkg/ruler/ruler.go
Outdated
| if clientPool == nil { | ||
| clientPool = newRulerClientPool(cfg.ClientTLSConfig, logger, reg) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't! :D I just wanted to make sure that was not nil -> So added a default.
cortex/pkg/alertmanager/distributor.go
Lines 46 to 48 in 95a407f
| if alertmanagerClientsPool == nil { | |
| alertmanagerClientsPool = newAlertmanagerClientsPool(client.NewRingServiceDiscovery(alertmanagersRing), cfg, logger, reg) | |
| } |
But i can remove NP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is now really long, so if it can be broken down to make it easier to follow that would be good.
I know... but if I split into 2 tests i will have basically the same test cases + lots of common code on setting up and everything... what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is now really long, so if it can be broken down to make it easier to follow that would be good.
Ok.. I did it! haha
I created a brand new test case -> it indeed seems easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You referred to NewDistributor(), but the difference is that function is exported; it makes sense that callers from another package can pass nil to get the default.
newRuler() is local to this package, and you never call it with nil, so these three lines are not required.
pkg/ruler/ruler_test.go
Outdated
| user1Group2 := &rulespb.RuleGroupDesc{User: user1, Namespace: "namespace", Name: "second"} | ||
| user2Group1 := &rulespb.RuleGroupDesc{User: user2, Namespace: "namespace", Name: "first"} | ||
| user3Group1 := &rulespb.RuleGroupDesc{User: user3, Namespace: "namespace", Name: "first"} | ||
| user1Group1 := &rulespb.RuleGroupDesc{User: user1, Namespace: "namespace", Name: "first", Interval: 10 * time.Second} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually wait 10 seconds? Might be worth commenting why this specific value, or having it as a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont... we just need something > 0 as if is 0 we will get a "Divided by Zero" error when loading the rules. Do you think it worth extract to a const anyway?
4242b91 to
f62510b
Compare
|
Hi @bboreham Thanks |
bboreham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating. One small comment but can fix it up later if necessary.
pkg/ruler/ruler.go
Outdated
| if clientPool == nil { | ||
| clientPool = newRulerClientPool(cfg.ClientTLSConfig, logger, reg) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You referred to NewDistributor(), but the difference is that function is exported; it makes sense that callers from another package can pass nil to get the default.
newRuler() is local to this package, and you never call it with nil, so these three lines are not required.
Signed-off-by: Alan Protasio <[email protected]>
f62510b to
16e1017
Compare
That make sense! Removed! I rebase the change and resolved the conflicts as well. Thanks a lot! |
|
Thanks @bboreham :D |
Signed-off-by: Alan Protasio <[email protected]> Signed-off-by: Alvin Lin <[email protected]>
What this PR does:
During some investigation I could see that we don't have any test coverage on the Ruler#GetRules method at all - specially the getShardedRules call.
New interfaces were create in order to make possible to mock the rulers Clients and ClientPool.
The issue i was investigating is that even with shuffle sharding enabled, we call all rulers on GetRulers Call. This is because we are not using the subring to get the replicationSet. See:
cortex/pkg/ruler/ruler.go
Line 744 in b4daa22
This is specially problematic as we throw 5xx if a single ruler is unhealthy (as replicationFactor=1). We can see in the tests that even rulers in "LEAVING" state are considered unhealthy causing "too many unhealthy instances in the ring" errors.
I will follow up with the improvement PR as soon this one is merged (i did not want to do refactor + fix in the same PR to make it easier to read).
PS: Im not sure if I need to add a CHANGELOG entry if the PR only include more tests.
Which issue(s) this PR fixes:
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]