-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Phase 0 - Inception
- Obtain schemas annotated with dimensions and metrics from the Metrics team (small) @nik9000
- Prototyping Lucene Data Pull Mechanism(medium) @imotov
- Prototyping Data Pull Mechanism in elasticsearch @imotov
Phase 1 - Mappings
- Add
time_series_dimensionmapping parameter to fields - Add time_series_metric parameter #76766 @csoulios
- TSDB: Add time series information to field caps #78790 @imotov
- TSDB: Automatically add timestamp mapper #79136 @weizijun
Phase 2 - Ingest
-
Dimension-based tsid generator
-
Routing
- Pull index routing into strategy object #77211 @nik9000
- Route documents to the correct shards in tsdb #77731 @nik9000
- Validate tsdb's routing_path #79384 @nik9000
- More validation for routing_path #79520 @nik9000
- TSDB: Do not allow index splits for time series indices #81125 @csoulios
- Improve FilterPathBasedFilter performance #79826 (Speed up xcontent filtering) @weizijun
- Have a good hard look at the switch statement in
BulkOperation. Maybe we can make this simpler. - TSDB: Test
idsquery on time series index #81436 @csoulios - Test and fix get-by-id TSDB: Support GET and DELETE and doc versioning #82633 (See linked issue for greater description of sub-points)
- Initial implementation of
_idfor tsid (TSDB: Support GET and DELETE and doc versioning #82633) - Generate better error messages when
_idis automatically generated (TSDB: improve document description on error #84903, TSDB: Add dimensions and timestamp to parse errors #84962) - Improve error messages on version conflict to include
_tsidand@timestamp(TSDB: Expand _id on version conflict #84957) - Size
- Investigate flipping
@timestampcomponent of the_idfrom little endian to big endian. That should mean there are more common prefixes. TSDB: shrink _id inverted index #85008 cuts the size of the inverted index for_idby 37%. That's not a lot of the index in total, but it sure does feel good for such a small change.
- Investigate flipping
- Misc
- Test TSDB's
_idinRecoverySourceHandlerTests.javaandEngineTests.javaTest time series id in RecoverySourceHandlerTests #84996, Use tsdb's id in Engine tests #85055 - Make it possible to modify
@timestampor dimensions in reindex TSDB: Initial reindex fix #86647 + Reindex support for TSDB creating indices #86704 - Test
_idwith the securitycreate_docprivilege. Can a user withcreate_doc(only) ingest new TSDB docs? Doescreate_docprevent a user from overwriting an existing TSDB doc? (create_docrelies on theOpTypeof theIndexRequest, which is automatically set toCREATEfor docs with auto-generated ids) TSDB: Testcreate_docpermission #86638
- Test TSDB's
- Initial implementation of
-
Handling Time Boundaries
- TSDB: add index timestamp range check #78291 (Added
start_time,end_timeindex settings ) @weizijun - Make time boundaries required in tsdb indices @weizijun TSDB: Make time boundaries settings required in tsdb indices #81146
- Replace hard check for index_mode=TIME_SERIES with bounds checking on start and end time @nik9000 TSDB: Cleaner trigger for tsdb boundary check #81263
- Tests for nanosecond timeprecision timestamp just beyond the limit
- Use @timestamp field to route documents to a backing index of a data stream #82079 @martijnvg
- Automated update of index time boundaries on index rollover @martijnvg
- Enforce default date formats for tsdb data streams #83517 (@martijnvg )
- Adjust get data stream api to include index_mode and per backing index the start and end time if data stream is tsdb. Include time series temporal ranges to get data stream api for data streams in time series index mode #83518
- Automatically skip shards of backing indices with time ranges (based on
index.time_series.start_timeandindex.time_series.end_timeindex settings) that don't match with the@timestamprange in a search request. Skip backing indices with a disjoint range on @timestamp field. #85162 (@martijnvg)
- TSDB: add index timestamp range check #78291 (Added
-
Other tasks
- Improve FilterPathBasedFilter performance #79826 @weizijun
- Compile a standard data set for comparative speed and space benchmarking (@nik9000) Create a track for tsdb data rally-tracks#222
- Change TSDB sort order #82238 @imotov
- Rewrite tsdb benchmark to use time series data streams with ilm policy. Instead of indexing into a regular index. @martijnvg
- Figure out how to parse source only once for determining the right backing index and index routing. Optimize tsdb data stream timestamp parsing if ingest pipeline is used #84046 @martijnvg
- Implement migrating existing data streams to data streams with time series index mode. Support migrating regular data streams to tsdb data streams #83520 @martijnvg
- Reconsider how time series data streams are enabled in templates. @martijnvg The current
index_modesetting isn't good enough. It requires additional config to be specified (time_series_dimensionattribute in mappings andindex.routing_pathas index settings) elsewhere and it doesn't allow the data stream tsdb features (routing based on@timestampfield) to be enabled without enabled the index level tsdb features. - A template will create time series data stream if
index.modesetting is set totime_series. - Autogenerate
index.routing_pathindex setting if not defined in composable index template that creates a tsdb data stream. All mapped fields of typekeywordandtime_series_dimensionenabled will be included in the generatedindex.routing_pathindex setting. Auto generate index.routing_path from mapping #86790 (@martijnvg)
- [ ] Theindex.routing_pathindex setting generation doesn't kick in when index.mode and dimension fields are defined in component templates. (@martijnvg).
Phase 2.1 Ingest follow ups
- [ ] Build the _id from dimension values
- [ ] Investigate moving timestamp to the front of the _id to automatically get an optimization on _id searches. Not sure if worth it - but possible. #84928 could be an alternative
- Bring back something in the spirit of the append-only optimization but that works for tsdb. That's super improve write performance. Extract append-only optimization from Engine #84771 is a partial prototype
- We store the
_idin lucene stored fields. We could regenerate it from the_sourceor from doc values for the@timestampand the_tsid. That'd save some bytes per document. - Move
IndexRequest#autoGeneratId? It's a bit spook where it is but I don't like it any other place. - Improve error messages in
_update_by_querywhen modifying the dimensions or@timestamp - On translog replay and recovery and replicas we regenerate the
_idand assert that it matches the_idfrom the primary. Should we? Probably. Let's make sure. - Add tsdb benchmarks to the nightlies
- [ ] Document best practices for using dimensions-based ID generator including how to use this with component templates
Phase 3.1 QL storage API (Postponed)
- Create simple time series reader
- Create a coordinating node level reader for tsdb #79197 @nik9000
- Add support for selectors in TimeSeriesMetricsService #79691 @imotov
- [ ] Reimplement QL storage API for TSDB database (depends on completion of Phase 2 and 3.2) (Postponed)
Phase 3.2 - Search MVP
Plans time series support in _search api are superceded by plans for this in ES|QL.
- Distributed nested delayed execution framework
- Treating data stream/index as a dimension
- [ ] Aggregation results filtering
- [ ] Retrieve the last value for a time series metric within a parent bucket - Time series aggregation
- Rate Function
- [ ] Add a new histogram field subtype to support Prometheus-style histograms
- [ ] TSDB indices could speed up cardinality aggregations on dimension fields #85523
- [ ] Should the _tsid agg return doc_counts by default?
- [ ] Shortcut aggs for TSDB #90423
Phase 3.3 - Rollup / Downsampling
- TSDB: Implement downsampling on time-series indices #85708 @csoulios
- Extract rollup configuration (dimensions, metrics) from index mapping
- Create rollup index (settings and mapping)
- Traverse source index using
TimeSeriesIndexSearcherand compute rollups docs and add them to the rollup index - Finalize action: publish index metadata, modify data stream, clean up temp index
- TSDB: Implement Downsampling ILM Action for time-series indices #87269 @csoulios
- Use the updated rollup config
- Revisit validations before invoking rollup process
- [TSDB] Add support for downsampling
aggregate_metric_doublefields #90029 @csoulios - Query downsampled indices, add validations for:
- Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252 @salvatore-campagna
- Intervals:
fixed_intervalvscalendar_interval time_zonedate_histogramresolution
- Field Caps API
- [TSDB] Expose
aggregate_metric_doublefields as their own field type instead ofdouble#87849 @csoulios - [TSDB] Metric fields in the field caps API #88695 @csoulios
- Expose information about if a field belongs to only time-series indices when querying multiple indices
- Shorten the response when some indices don't map fields as the same time series parameter - right now it's a list of indices which is nice but kibana only needs to know if the list is non-empty
- [TSDB] Expose
- Misc
- [TSDB] Add Kahan support to downsampling summation #87554 @csoulios
- Implement logic for storing fields that are neither dimensions nor metrics (aka tags) #87929 @salvatore-campagna
- Make rollup task cancellable TSDB: Downsampling support cancelled #88496 @weizijun
- Support aggregate_double_metric fields in the Field API #88534 @salvatore-campagna
- Support text field labels
- Support multi valued metrics #88818 @salvatore-campagna
- Handle rollup failures
- Update tsdb rally track to add benchmarks for downsampling Include metric types in tsdb index and template mappings rally-tracks#316 @salvatore-campagna
- Downsampling performance analysis and improvement #90226 @salvatore-campagna
Phase 3.4 - TSID aggs (superseded by tsdb in ES|QL)
~~ - [ ] Update min, max, sum, avg pipeline aggs for intermediate result filtering optimization ~~
~~ - [ ] Sliding window aggregation ~~
~~ - [ ] A way to filter to windows within the sliding window. Like "measurements take in the last 30 seconds of the window". ~~
~~ - [ ] Open transform issue for newly added time series aggs ~~
~~ - [ ] Benchmarks for the tsid agg ~~
Phase 3.5 - Downsampling follow ups
- Handling histograms
- SQL support for downsampling
Phase 4.0 - Compression
- Synthetic
_source@nik9000 Synthetic Source #86603 - Optimization of merge policies (Move backing indices of data streams to LogByteMergePolicy #87684)
- Deltas of deltas compression
- What about sequence number?
Phase 5.0 - Follow-ups and Nice-to-have-s
- Default the setting's value to all of the keyword dimensions
- Support shard splitting on time_series indices
- Make an object or interface for
_id's values. Right now it's aStringthat we encode withUid.encodeId. That was reasonable. Maybe it still is. But it feels complex and for tsdb who's_idis always some bytes. And encoding it also wastes a byte about 1/128 of the time. It's a common prefix byte so this is probably not really an issue. But still. This is a big change but it'd make ES easier to read. Probably wouldn't really improve the storage though. - Figure out how to specify tsdb settings in component templates. For example index.routing_path can be specified in a composable index template if data stream template' index_mode is set to time_series. But if this setting is specified in a component template then it is required to also set the index.mode index setting. This feels backwards. @martijnvg
- In order to retrieve the routing values (defined in
index.routin_path), the source needs to be parsed on coordinating node. However in the case that an ingest pipeline is executed this, then the source of document will be parsed for the second time. Ideally the routing values should be extracted when ingest is performed. Similar to how the@timestampfield is already retrieved from a document during pipeline execution. - In order to determine the backing index a document should be to, a timestamp is parsed into
Instant. The format being used is:strict_date_optional_time_nanos||strict_date_optional_time||epoch_millis. This to allow regular data format, data nanos date format and epoch since mills defined as string. We can optimise the data parsing if we know the exact format being used. For example if on data stream there is parameter that indices that exact data format we can optimise parsing by either usingstrict_date_optional_time_nanos,strict_date_optional_timeorepoch_millis.