[Transform] improve performance by using point in time API for search #74984

hendrikmuhs · 2021-07-06T14:45:54Z

Use point in time API for every checkpoint in transform. Using point in time reduces pressure on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching remote clusters) it falls back to ordinary search requests as before.

closes #73481

elasticmachine · 2021-07-06T14:45:57Z

Pinging @elastic/ml-core (Team:ML)

hendrikmuhs · 2021-07-12T07:52:49Z

retest this please

przemekwitek

LGTM

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

benwtrent · 2021-07-12T17:55:36Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

+        if (getNextCheckpoint().getCheckpoint() != pitCheckpoint) {
+            closePointInTime();
+        }


It seems to me that we should not move this execution thread forward until the pit is closed.

It is conceivable right (though unlikely) that this closePointInTime() is executing, but doSearch is being handled and consequently, we close the wrong PIT and leave one left over.

pit is "copied" (not literally, but the reference) and set to null in the sync part of closePointInTime(), see line 470++. So you are right that we might open a new pit while still closing the other, however that's allowed and I don't see a race condition that could lead to mixing up the two.

@hendrikmuhs 100%, I misread the method. Setting a local variable synchronously should avoid that problem :).

benwtrent · 2021-07-12T17:56:29Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

+            ActionListener.wrap(response -> { logger.trace("[{}] closed pit search context [{}]", getJobId(), oldPit); }, e -> {
+                // note: closing the pit should never throw, even if the pit is invalid
+                logger.error(new ParameterizedMessage("[{}] Failed to close point in time reader", getJobId()), e);
+            })


I think the logger.trace should have a message supplier like () -> new ParameterizedMessage to prevent strings from being created when trace is disabled.

Not a huge deal as this is not a "hot path"

I don't think this applies. This is only a problem if one or more arguments needs to be constructed, e.g. if getJobId() would build the id and therefore execute something. This is not the case.

The message string itself gets only constructed after the check whether trace is enabled or not.

I wish we have static code analysis for this this, it's such a common problem.

benwtrent · 2021-07-12T17:57:00Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

+                pit = new PointInTimeBuilder(response.getPointInTimeId()).setKeepAlive(PIT_KEEP_ALIVE);
+                searchRequest.source().pointInTimeBuilder(pit);
+                pitCheckpoint = getNextCheckpoint().getCheckpoint();
+                logger.trace("[{}] using pit search context with id [{}]", getJobId(), pit.getEncodedId());


Similar comment () -> new ParameterizedMessage seems better to me for trace

benwtrent · 2021-07-12T17:58:19Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

    private final AtomicBoolean oldStatsCleanedUp = new AtomicBoolean(false);

    private final AtomicReference<SeqNoPrimaryTermAndIndex> seqNoPrimaryTermAndIndex;
+    private PointInTimeBuilder pit;


does pit and disablePit need to be volatile? They are accessed from separate threads in different execution paths.

pit is always accessed by the indexer thread, even the onStop call originates from the indexer, not from the _stop transport if that's what you mean

But I am unsure, the async behavior of the indexer might indeed be problematic in this case. I will check other variables, too.

hendrikmuhs · 2021-07-13T08:22:47Z

@elasticmachine update branch

benwtrent · 2021-07-13T11:18:23Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

+        if (getNextCheckpoint().getCheckpoint() != pitCheckpoint) {
+            closePointInTime();
+        }


@hendrikmuhs 100%, I misread the method. Setting a local variable synchronously should avoid that problem :).

hendrikmuhs · 2021-07-14T09:10:56Z

Test results

Using pit refreshes on the source index can be reduced significantly:

This chart compares a baseline (indexing without doing any searches/transforms), transform without pit (<7.15) and transform using pit (>7.15).

The usage of pit reduces the number of refreshes significantly, correlated to that the number of merges goes down from 1440 to 560 and the time spend merging from 17.1 minutes to 9.1 minutes.

It depends: This is not a representative benchmark. The benefit of using pit depends on data, ingest rates, the transform configuration and other query executors like dashboards that use the same source index.

In summary you might not see a resource reduction in the same order of magnitude with your data, however pit should reduce overhead for all use cases where the source index isn't static (continuous transform).

…earch (#75333) Use point in time API for every checkpoint in transform. Using point in time reduces pressure on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching remote clusters) it falls back to ordinary search requests as before. closes #73481 backport #74984

…elastic#74984) Use point in time API for every checkpoint in transform. Using point in time reduces pressure on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching remote clusters) it falls back to ordinary search requests as before. closes elastic#73481

Fix a unreleased regression introduced in #74984. In case a pit search context disappeared the listener was called twice and the transform fails.

…c#75615) Fix a unreleased regression introduced in elastic#74984. In case a pit search context disappeared the listener was called twice and the transform fails.

…75615) (#75619) Fix a unreleased regression introduced in #74984. In case a pit search context disappeared the listener was called twice and the transform fails.

…c#75615) Fix a unreleased regression introduced in elastic#74984. In case a pit search context disappeared the listener was called twice and the transform fails.

hendrikmuhs added >enhancement v8.0.0 :ml/Transform Transform v7.15.0 labels Jul 6, 2021

elasticmachine added the Team:ML Meta label for the ML team label Jul 6, 2021

hendrikmuhs force-pushed the transform-pit branch 2 times, most recently from 691872c to 4737c80 Compare July 8, 2021 18:25

Hendrik Muhs added 6 commits July 12, 2021 10:05

use pit for search

312b102

implement pit error handling

a46bc7c

log pit id

6ea5bb2

close pit on stop

2883c03

checkstyle

3295719

add tests for pit handling

d501166

hendrikmuhs force-pushed the transform-pit branch from 4737c80 to d501166 Compare July 12, 2021 08:05

przemekwitek approved these changes Jul 12, 2021

View reviewed changes

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java Outdated Show resolved Hide resolved

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java Outdated Show resolved Hide resolved

address review comments and apply spotless

439988a

benwtrent reviewed Jul 12, 2021

View reviewed changes

make members volatile

e5f2ece

Merge branch 'master' into transform-pit

bc9784f

benwtrent approved these changes Jul 13, 2021

View reviewed changes

hendrikmuhs merged commit 15a3b35 into elastic:master Jul 14, 2021

hendrikmuhs deleted the transform-pit branch July 14, 2021 10:00

hendrikmuhs mentioned this pull request Jul 14, 2021

[7.x][Transform] improve performance by using point in time API for search #75333

Merged

hendrikmuhs mentioned this pull request Jul 22, 2021

[Transform] fix listener for search context missing exception #75615

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

hendrikmuhs mentioned this pull request Dec 2, 2021

[Transform] Transform fails with task encountered irrecoverable failure: org.elasticsearch.index.IndexNotFoundException #81252

Closed

[Transform] improve performance by using point in time API for search #74984

[Transform] improve performance by using point in time API for search #74984

Uh oh!

Conversation

hendrikmuhs commented Jul 6, 2021

Uh oh!

elasticmachine commented Jul 6, 2021

Uh oh!

hendrikmuhs commented Jul 12, 2021

Uh oh!

przemekwitek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benwtrent Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs commented Jul 13, 2021

Uh oh!

benwtrent Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs commented Jul 14, 2021

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hendrikmuhs Jul 13, 2021 •

edited

Loading

hendrikmuhs Jul 13, 2021 •

edited

Loading