-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:ml/TransformTransformTransform>enhancementTeam:MLMeta label for the ML teamMeta label for the ML team
Description
Transform uses a composite aggregation to page through the source. Benchmarks have shown that the executed ordinary searches slow down the ingest of the source. Due to transform a refresh is triggered more often than usual, causing the source ingest to do more work than without it.
By using a point in time reader, transform will prevent causing so much churn (benchmarks have shown a potential reduction of refreshes by 50%).
Requirements
- open a pit reader at the beginning of a new checkpoint
- use pit for all searches in the checkpoint
- re-create a pit if necessary (e.g. timeout, start/stop)
- don't fail due to a broken pit
keepAliveshould be kept reasonably small
- explicitly delete the pit
- after a checkpoint
- on stop
- fallback to non pit mode, in case a node older
7.10is part of the local or remote (CCS) cluster
Design considerations
Searches are executed by ClientTransformIndexer which inherits from TransformIndexer, adding the search capabilities given a client. This needs to be enhanced to create/destroy the pit object.
Future: Given checkpoints it will be possible to prune the search query/indices and e.g. avoid calling out to frozen/cold indices.
Metadata
Metadata
Assignees
Labels
:ml/TransformTransformTransform>enhancementTeam:MLMeta label for the ML teamMeta label for the ML team