Introduced time-based concurrent compaction #2616

pracucci · 2020-05-20T15:40:36Z

What this PR does:
Few days ago we introduced blocks sharding in the compactor (#2599) and it gave us significant benefits compacting a large tenant blocks (30M active series before replication). However, the blocks sharding approach brings also a major downside: the samples deduplication is significantly less effective (ie. we mesured that with a RF=3 and shards=3 we deduplicate about 45% of samples, and the situation gets worse for more shards).

After some experimentation, in this PR I'm proposing a different strategy. The idea is to accept the fact that to compact 2h blocks it will take more than 2h, but we should be able to concurrently run multiple compactions for non-overlapping ranges (if available).

How it works: I've replaced the TSDB compactor planner with a custom one and I've done some changes in Thanos to be able to inject a custom "blocks grouping" logic. Basically, the Grouper is responsible to create groups of blocks that should be compacted together and thus the TSDB Plan() is just a pass-through because the planning already occurred in the grouping.

This is a first step. The way the grouper works makes it relatively easy to shard planned groups across multiple nodes using the same ring we use to shard by tenant. This will be done in a future PR.

This PR is a draft because:

I need get the Grouper refactoring in Thanos first

Notes to reviewers:

I switched the -compactor.consistency-delay default to 0s because it should be fine for consistent object stores. This change is not strictly related to this PR changes, but it's somewhat desiderable to reduce the chances of having to re-compact the same range twice.

We're already testing this change with the large customer and so far it looks working fine. Ie. given -compactor.compaction-concurrency=5 we're able to max out 5 CPU cores most of the time (each compaction is single threaded so it uses at most 1 core):

Which issue(s) this PR fixes:
N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <[email protected]>

pracucci · 2020-05-22T15:58:58Z

Closing while waiting for upstream changes.

pracucci requested a review from pstibrany May 20, 2020 15:40

pull-request-size bot added the size/XXL label May 20, 2020

Introduced time-based concurrent compaction

58d378b

Signed-off-by: Marco Pracucci <[email protected]>

pracucci force-pushed the compactor-time-based-sharding branch from e143871 to 58d378b Compare May 22, 2020 10:44

Fixed tests after rebase

b5c39e4

Signed-off-by: Marco Pracucci <[email protected]>

pracucci closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduced time-based concurrent compaction #2616

Introduced time-based concurrent compaction #2616

Uh oh!

pracucci commented May 20, 2020 •

edited

Loading

Uh oh!

pracucci commented May 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Introduced time-based concurrent compaction #2616

Introduced time-based concurrent compaction #2616

Uh oh!

Conversation

pracucci commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci commented May 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pracucci commented May 20, 2020 •

edited

Loading