Skip to content

Conversation

@alanprot
Copy link
Member

@alanprot alanprot commented Aug 26, 2021

What this PR does:
During some investigation I could see that we don't have any test coverage on the Ruler#GetRules method at all - specially the getShardedRules call.

New interfaces were create in order to make possible to mock the rulers Clients and ClientPool.

The issue i was investigating is that even with shuffle sharding enabled, we call all rulers on GetRulers Call. This is because we are not using the subring to get the replicationSet. See:

rulers, err := r.ring.GetReplicationSetForOperation(RingOp)

This is specially problematic as we throw 5xx if a single ruler is unhealthy (as replicationFactor=1). We can see in the tests that even rulers in "LEAVING" state are considered unhealthy causing "too many unhealthy instances in the ring" errors.

I will follow up with the improvement PR as soon this one is merged (i did not want to do refactor + fix in the same PR to make it easier to read).

PS: Im not sure if I need to add a CHANGELOG entry if the PR only include more tests.

Which issue(s) this PR fixes:

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@alanprot alanprot changed the title Improving Test Coverage for getShardedRules call Adding Test for getShardedRules call Aug 26, 2021
@jeromeinsf
Copy link
Contributor

I guess

  • Tests updated

@alanprot alanprot force-pushed the test/GetRules branch 9 times, most recently from b80ca34 to a90e35c Compare August 26, 2021 17:48
@alanprot
Copy link
Member Author

The next change will be something like this: alanprot@f23f10f

This is the issue i wanted to fix.

@bboreham
Copy link
Contributor

PS: Im not sure if I need to add a CHANGELOG entry if the PR only include more tests.

No - CHANGELOG is for a Cortex admin to see what affects them in the new version.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly good; I made some comments.
The test is now really long, so if it can be broken down to make it easier to follow that would be good.

Comment on lines 253 to 255
if clientPool == nil {
clientPool = newRulerClientPool(cfg.ClientTLSConfig, logger, reg)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Member Author

@alanprot alanprot Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't! :D I just wanted to make sure that was not nil -> So added a default.

if alertmanagerClientsPool == nil {
alertmanagerClientsPool = newAlertmanagerClientsPool(client.NewRingServiceDiscovery(alertmanagersRing), cfg, logger, reg)
}

But i can remove NP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is now really long, so if it can be broken down to make it easier to follow that would be good.

I know... but if I split into 2 tests i will have basically the same test cases + lots of common code on setting up and everything... what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is now really long, so if it can be broken down to make it easier to follow that would be good.

Ok.. I did it! haha
I created a brand new test case -> it indeed seems easier to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You referred to NewDistributor(), but the difference is that function is exported; it makes sense that callers from another package can pass nil to get the default.
newRuler() is local to this package, and you never call it with nil, so these three lines are not required.

user1Group2 := &rulespb.RuleGroupDesc{User: user1, Namespace: "namespace", Name: "second"}
user2Group1 := &rulespb.RuleGroupDesc{User: user2, Namespace: "namespace", Name: "first"}
user3Group1 := &rulespb.RuleGroupDesc{User: user3, Namespace: "namespace", Name: "first"}
user1Group1 := &rulespb.RuleGroupDesc{User: user1, Namespace: "namespace", Name: "first", Interval: 10 * time.Second}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually wait 10 seconds? Might be worth commenting why this specific value, or having it as a constant.

Copy link
Member Author

@alanprot alanprot Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont... we just need something > 0 as if is 0 we will get a "Divided by Zero" error when loading the rules. Do you think it worth extract to a const anyway?

@alanprot alanprot force-pushed the test/GetRules branch 7 times, most recently from 4242b91 to f62510b Compare August 27, 2021 03:07
@alanprot
Copy link
Member Author

Hi @bboreham
I updated the PR and created a separated test... Can u take a look now?

Thanks

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. One small comment but can fix it up later if necessary.

Comment on lines 253 to 255
if clientPool == nil {
clientPool = newRulerClientPool(cfg.ClientTLSConfig, logger, reg)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You referred to NewDistributor(), but the difference is that function is exported; it makes sense that callers from another package can pass nil to get the default.
newRuler() is local to this package, and you never call it with nil, so these three lines are not required.

@alanprot
Copy link
Member Author

alanprot commented Sep 1, 2021

Thanks for updating. One small comment but can fix it up later if necessary.

That make sense! Removed!

I rebase the change and resolved the conflicts as well.

Thanks a lot!

@bboreham bboreham merged commit 03f926b into cortexproject:master Sep 2, 2021
@alanprot
Copy link
Member Author

alanprot commented Sep 2, 2021

Thanks @bboreham :D

alvinlin123 pushed a commit to ac1214/cortex that referenced this pull request Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants