Skip to content

[Transform] Use point in time search, optimize query execution #73481

@hendrikmuhs

Description

@hendrikmuhs

Transform uses a composite aggregation to page through the source. Benchmarks have shown that the executed ordinary searches slow down the ingest of the source. Due to transform a refresh is triggered more often than usual, causing the source ingest to do more work than without it.

By using a point in time reader, transform will prevent causing so much churn (benchmarks have shown a potential reduction of refreshes by 50%).

Requirements

  • open a pit reader at the beginning of a new checkpoint
  • use pit for all searches in the checkpoint
    • re-create a pit if necessary (e.g. timeout, start/stop)
    • don't fail due to a broken pit
    • keepAlive should be kept reasonably small
  • explicitly delete the pit
    • after a checkpoint
    • on stop
  • fallback to non pit mode, in case a node older 7.10 is part of the local or remote (CCS) cluster

Design considerations

Searches are executed by ClientTransformIndexer which inherits from TransformIndexer, adding the search capabilities given a client. This needs to be enhanced to create/destroy the pit object.

Future: Given checkpoints it will be possible to prune the search query/indices and e.g. avoid calling out to frozen/cold indices.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions