Add retry for field caps node requests #78647

dnhatn · 2021-10-04T20:33:00Z

This change targets the feature branch: group-field-caps (based on 7.x).

This adds a retry mechanism for node-based field caps requests introduced in #77047. Merging index responses on data nodes will be implemented in a follow-up.

elasticmachine · 2021-10-05T02:28:29Z

Pinging @elastic/es-search (Team:Search)

ywelsch

Thanks Nhat. I've done just a quick pass today (didn't get further). I'm wondering if some of the retry logic around shard selection / grouping can be unit-tested (e.g. we currently test that retries ARE happening, but don't test how many etc).

server/src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/FieldCapabilitiesIT.java

server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java

dnhatn · 2021-10-06T03:34:25Z

@ywelsch Thank you for your review. All good points - I am addressing them.

dnhatn · 2021-10-07T03:29:11Z

@ywelsch I think I have addressed your feedback. I will add some more unit tests to RequestDispatcher and IT tests. Would you mind taking another look?

I think we can use the existing FieldCapabilitiesRequest instead of introducing FieldCapabilitiesNodeRequest when the merging response is implemented. I will consider this in the merge response PR.

…-field-caps

jtibshirani

Thanks for tackling this @dnhatn. I like that we have a dedicated class now to handle the request dispatching logic. The test coverage also looks great.

The one part I wasn't sure of was the synchronization strategy in RequestDispatcher. There is quite a bit of logic guarded under synchronized blocks, especially the one in execute. I wonder if it'd be better (and if it's even possible) to rely on atomic integers/ thread-safe collections for this? I haven't identified a concrete concern, just raising it to hear your thoughts.

.../src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesIndexAction.java

server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java

server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java

server/src/test/java/org/elasticsearch/action/fieldcaps/RequestDispatcherTests.java

dnhatn · 2021-10-11T01:47:30Z

@ywelsch Please hold off on the review. I am working on merging the responses, and I will integrate it in this PR.

dnhatn · 2021-10-11T02:08:31Z

I wonder if it'd be better (and if it's even possible) to rely on atomic integers/ thread-safe collections for this? I haven't identified a concrete concern, just raising it to hear your thoughts

@jtibshirani Yes, we can go without synchronization.

dnhatn

@ywelsch I've updated this PR. The RequestDispatcher and its tests are ready. The merging logic is still WIP. I need to discuss it with you before completing it. Would you please review the RequestDispatcher and the approach of the merging results logic? I will take a look at your can_match PR tomorrow. Sorry for the delay - I've been focusing on this PR. Thank you!

server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilities.java

server/src/main/java/org/elasticsearch/action/fieldcaps/MergeResultsMode.java

ywelsch

I am working on merging the responses, and I will integrate it in this PR.

Let's revert that part. It has become too difficult to review this PR, and I think we will need more discussions on the merging logic. Let's not block the node-level action on this, but create a clear list of follow-ups.

.../src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/CCSFieldCapabilitiesIT.java

server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java

ywelsch · 2021-10-13T08:32:28Z

server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java

+                // and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid
+                // sending multiple requests.
+                final DiscoveryNode discoNode = discoveryNodes.get(node.getKey());
+                if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) {


it's unfortunate that the BWC logic is spread to both here and the sendRequestToNode method. Can we avoid this?

Unfortunately, I couldn't find a clean way. Any suggestion is welcome :).

I don't have a better suggestion, unfortunately, so let' leave as is.

I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.

I prefer to keep this optimization to be consistent with 8.0. However, I can make this change if you and Yannick have a strong opinion on it.

I don't feel strongly, happy to go with what you (and @ywelsch) prefer here.

This reverts commit e15b23e.

dnhatn · 2021-10-13T21:07:46Z

@ywelsch @jtibshirani Thanks for reviews. This is ready again after I removed the merging logic. Would you mind taking another look?

ywelsch

LGTM

jtibshirani

This looks good to me too, I just left some small comments.

server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java

server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java

jtibshirani · 2021-10-14T19:57:51Z

server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java

+                // and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid
+                // sending multiple requests.
+                final DiscoveryNode discoNode = discoveryNodes.get(node.getKey());
+                if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) {


I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.

jtibshirani

Looks good to me 🎉

…-field-caps

dnhatn · 2021-10-14T22:41:01Z

@ywelsch @jtibshirani Thanks so much for your reviews.

This adds a retry mechanism for node level field caps requests introduced in elastic#77047.

Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node. Relates #77047 Relates # #78647 Co-authored-by: Julie Tibshirani <[email protected]>

Add retry for field caps node requests

3804c00

dnhatn force-pushed the 7x-group-field-caps branch from 3b58fc7 to 3804c00 Compare October 5, 2021 01:52

dnhatn marked this pull request as ready for review October 5, 2021 02:28

dnhatn added the :Search/Search Search-related issues that do not fall into other categories label Oct 5, 2021

elasticmachine added the Team:Search Meta label for search team label Oct 5, 2021

dnhatn added >feature and removed Team:Search Meta label for search team labels Oct 5, 2021

dnhatn requested review from jimczi, jtibshirani and ywelsch October 5, 2021 02:28

ywelsch reviewed Oct 5, 2021

View reviewed changes

dnhatn added 2 commits October 6, 2021 22:59

combine dispatch

f6554bd

fix compile

6abcd03

dnhatn requested a review from ywelsch October 7, 2021 03:29

Merge remote-tracking branch 'elastic/group-field-caps' into 7x-group…

d473ac8

…-field-caps

jtibshirani reviewed Oct 11, 2021

View reviewed changes

Add response builder

e15b23e

dnhatn commented Oct 13, 2021

View reviewed changes

server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilities.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/action/fieldcaps/MergeResultsMode.java Outdated Show resolved Hide resolved

ywelsch reviewed Oct 13, 2021

View reviewed changes

dnhatn added 2 commits October 13, 2021 10:54

Revert "Add response builder"

9dbc0f1

This reverts commit e15b23e.

dispatcher only

8d0f938

dnhatn requested review from jtibshirani and ywelsch October 13, 2021 21:07

ywelsch approved these changes Oct 14, 2021

View reviewed changes

jtibshirani reviewed Oct 14, 2021

View reviewed changes

dnhatn added 4 commits October 14, 2021 16:35

NoShardAvailableActionException

cbc6c71

rename

b80ff43

remove null check

45b519f

shard action runs on the same threadpool

ac45012

jtibshirani approved these changes Oct 14, 2021

View reviewed changes

Merge remote-tracking branch 'elastic/group-field-caps' into 7x-group…

e24547f

…-field-caps

dnhatn merged commit 6f31965 into elastic:group-field-caps Oct 14, 2021

dnhatn deleted the 7x-group-field-caps branch October 14, 2021 22:41

dnhatn mentioned this pull request Oct 15, 2021

Add node-level field caps requests #79212

Merged

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Oct 15, 2021

Add retry for node level field caps requests (elastic#78647)

557c1b0

This adds a retry mechanism for node level field caps requests introduced in elastic#77047.

dnhatn mentioned this pull request Oct 15, 2021

Add node-level field caps requests #79214

Merged

Add retry for field caps node requests #78647

Add retry for field caps node requests #78647

Uh oh!

Conversation

dnhatn commented Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Oct 5, 2021

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Oct 6, 2021

Uh oh!

dnhatn commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Oct 11, 2021

Uh oh!

dnhatn commented Oct 11, 2021

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ywelsch Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

dnhatn Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

ywelsch Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

dnhatn Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Oct 13, 2021

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jtibshirani Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Oct 4, 2021 •

edited

Loading

dnhatn commented Oct 7, 2021 •

edited

Loading