Skip to content

Conversation

@pracucci
Copy link
Contributor

@pracucci pracucci commented May 20, 2020

What this PR does:
Few days ago we introduced blocks sharding in the compactor (#2599) and it gave us significant benefits compacting a large tenant blocks (30M active series before replication). However, the blocks sharding approach brings also a major downside: the samples deduplication is significantly less effective (ie. we mesured that with a RF=3 and shards=3 we deduplicate about 45% of samples, and the situation gets worse for more shards).

After some experimentation, in this PR I'm proposing a different strategy. The idea is to accept the fact that to compact 2h blocks it will take more than 2h, but we should be able to concurrently run multiple compactions for non-overlapping ranges (if available).

How it works: I've replaced the TSDB compactor planner with a custom one and I've done some changes in Thanos to be able to inject a custom "blocks grouping" logic. Basically, the Grouper is responsible to create groups of blocks that should be compacted together and thus the TSDB Plan() is just a pass-through because the planning already occurred in the grouping.

This is a first step. The way the grouper works makes it relatively easy to shard planned groups across multiple nodes using the same ring we use to shard by tenant. This will be done in a future PR.

This PR is a draft because:

  • I need get the Grouper refactoring in Thanos first

Notes to reviewers:

  • I switched the -compactor.consistency-delay default to 0s because it should be fine for consistent object stores. This change is not strictly related to this PR changes, but it's somewhat desiderable to reduce the chances of having to re-compact the same range twice.

We're already testing this change with the large customer and so far it looks working fine. Ie. given -compactor.compaction-concurrency=5 we're able to max out 5 CPU cores most of the time (each compaction is single threaded so it uses at most 1 core):
Screen Shot 2020-05-20 at 17 42 12

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci force-pushed the compactor-time-based-sharding branch from e143871 to 58d378b Compare May 22, 2020 10:44
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci
Copy link
Contributor Author

Closing while waiting for upstream changes.

@pracucci pracucci closed this May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant