Skip to content

Conversation

@bpintea
Copy link
Contributor

@bpintea bpintea commented Jul 28, 2021

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)

Closes #75825.

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)
bpintea added 2 commits July 28, 2021 21:51
Correct the accounting as the diff between VM's time and task's creation
time.
If search task is set in under 1ms, the test will complete with an
accounting of 0ms.
@bpintea
Copy link
Contributor Author

bpintea commented Jul 29, 2021

@elasticmachine update branch

@bpintea bpintea marked this pull request as ready for review July 29, 2021 07:36
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Jul 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@Luegg Luegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

Just wondering whether we have a general preference for the JVM time? Otherwise, System.currentTimeMillis() - task.getStartTime() might also be an option?

@bpintea bpintea added the auto-backport Automatically create backport pull requests when merged label Jul 29, 2021
Switch to millis math.
@bpintea
Copy link
Contributor Author

bpintea commented Jul 29, 2021

Just wondering whether we have a general preference for the JVM time? Otherwise, System.currentTimeMillis() - task.getStartTime() might also be an option?

You're right, the original code was doing millis math in the first place. I guess nanos math makes sense to more accurately differentiate between 0 and 1ms, but don't think this is that relevant in this case. Applied your suggestion.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
This is fixing an annoying issue that I noticed myself as well. Thanks for addressing it.

@bpintea
Copy link
Contributor Author

bpintea commented Jul 29, 2021

@elasticmachine run elasticsearch-ci/part-2

@imotov
Copy link
Contributor

imotov commented Jul 29, 2021

Just wondering whether we have a general preference for the JVM time?

I don't think it is a general preference, but for calculating elapsed time we are trying to go with System.nanoTime() nowadays. There are a couple of reasons for that. System.nanoTime() is monotonic (which exception of some JVM bugs), while System.currentTimeMillis() is not by design. In other words, System.currentTimeMillis() - task.getStartTime() can be negative if time on the system clock is update by a user or an NTP server at just the right moment. On some system you can also expect System.nanoTime() to be more precise than System.currentTimeMillis(). I saw reports of the first having granularity of hundreds of nanoseconds, while the second only ticking every 10ms or even 50ms on some older Windows systems. One notable exception when we would want to use System.currentTimeMillis() for elapse time calculation is if we move the data between nodes. System.nanoTime() doesn't really mean anything outside of JVM where the start time was computed, so if you start an event on one node and finish on another, System.currentTimeMillis() has to be used (but then all bets regarding monotonicity are off anyway since we don't synchronize clocks).

Swtich back to converted nanos-accounting.
@bpintea
Copy link
Contributor Author

bpintea commented Aug 2, 2021

@elasticmachine update branch

@bpintea
Copy link
Contributor Author

bpintea commented Aug 2, 2021

@elasticmachine run elasticsearch-ci/docs

@bpintea
Copy link
Contributor Author

bpintea commented Aug 2, 2021

Thanks for the explanations, @imotov, reverted to your proposal.

@bpintea bpintea merged commit d299954 into elastic:master Aug 3, 2021
@bpintea bpintea deleted the fix/eql_async_took_negative branch August 3, 2021 08:35
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 3, 2021
)

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 3, 2021
)

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.13
7.14

elasticsearchmachine added a commit that referenced this pull request Aug 3, 2021
…76000)

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)

Co-authored-by: Bogdan Pintea <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Aug 3, 2021
…76001)

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)

Co-authored-by: Bogdan Pintea <[email protected]>
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021
)

This corrects the accounting of answer time reported in "took" attribute
of an intermediary EQL async response. This has been reported as a
negative value, since the delta mixes wall-clock time with VM time.

(The confusion is favoured by Task#getStartTime() returning a wall-clock
time, while Task#getStartTimeNanos() VM's time.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/EQL EQL querying auto-backport Automatically create backport pull requests when merged >bug Team:QL (Deprecated) Meta label for query languages team v7.13.5 v7.14.1 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EQL: intermediary async responses reported as taking negative time

7 participants