Skip to content

Ingester flush should balance across tables and hashes #724

@bboreham

Description

@bboreham

Currently flushing is driven off a priority queue in order of the start time of the first unflushed chunk in a series.

This is especially bad when we’ve scaled down write capacity on the previous table and someone sends some samples from that time.

It’s also bad if there are a lot of timeseries that map to the same index hash (i.e. same metric name for same user on same day), because that creates a hot-spot in the data store.

It would be better to have separate queues for each table, and to send a spread of data which is known to have different hash keys. Even better if you can batch it up to reduce network overhead to the back end.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions