-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add retry for field caps node requests #78647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b58fc7 to
3804c00
Compare
|
Pinging @elastic/es-search (Team:Search) |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat. I've done just a quick pass today (didn't get further). I'm wondering if some of the retry logic around shard selection / grouping can be unit-tested (e.g. we currently test that retries ARE happening, but don't test how many etc).
server/src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/FieldCapabilitiesIT.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
|
@ywelsch Thank you for your review. All good points - I am addressing them. |
|
@ywelsch I think I have addressed your feedback. I will add some more unit tests to RequestDispatcher and IT tests. Would you mind taking another look? I think we can use the existing |
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this @dnhatn. I like that we have a dedicated class now to handle the request dispatching logic. The test coverage also looks great.
The one part I wasn't sure of was the synchronization strategy in RequestDispatcher. There is quite a bit of logic guarded under synchronized blocks, especially the one in execute. I wonder if it'd be better (and if it's even possible) to rely on atomic integers/ thread-safe collections for this? I haven't identified a concrete concern, just raising it to hear your thoughts.
.../src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesIndexAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/action/fieldcaps/RequestDispatcherTests.java
Outdated
Show resolved
Hide resolved
|
@ywelsch Please hold off on the review. I am working on merging the responses, and I will integrate it in this PR. |
@jtibshirani Yes, we can go without synchronization. |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywelsch I've updated this PR. The RequestDispatcher and its tests are ready. The merging logic is still WIP. I need to discuss it with you before completing it. Would you please review the RequestDispatcher and the approach of the merging results logic? I will take a look at your can_match PR tomorrow. Sorry for the delay - I've been focusing on this PR. Thank you!
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilities.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/MergeResultsMode.java
Outdated
Show resolved
Hide resolved
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on merging the responses, and I will integrate it in this PR.
Let's revert that part. It has become too difficult to review this PR, and I think we will need more discussions on the merging logic. Let's not block the node-level action on this, but create a clear list of follow-ups.
.../src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/CCSFieldCapabilitiesIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Show resolved
Hide resolved
| // and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid | ||
| // sending multiple requests. | ||
| final DiscoveryNode discoNode = discoveryNodes.get(node.getKey()); | ||
| if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unfortunate that the BWC logic is spread to both here and the sendRequestToNode method. Can we avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I couldn't find a clean way. Any suggestion is welcome :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a better suggestion, unfortunately, so let' leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep this optimization to be consistent with 8.0. However, I can make this change if you and Yannick have a strong opinion on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly, happy to go with what you (and @ywelsch) prefer here.
This reverts commit e15b23e.
|
@ywelsch @jtibshirani Thanks for reviews. This is ready again after I removed the merging logic. Would you mind taking another look? |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me too, I just left some small comments.
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
| // and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid | ||
| // sending multiple requests. | ||
| final DiscoveryNode discoNode = discoveryNodes.get(node.getKey()); | ||
| if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🎉
|
@ywelsch @jtibshirani Thanks so much for your reviews. |
This adds a retry mechanism for node level field caps requests introduced in elastic#77047.
Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node. Relates #77047 Relates # #78647 Co-authored-by: Julie Tibshirani <[email protected]>
Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node. Relates #77047 Relates # #78647 Co-authored-by: Julie Tibshirani <[email protected]>
This change targets the feature branch:
group-field-caps(based on 7.x).This adds a retry mechanism for node-based field caps requests introduced in #77047. Merging index responses on data nodes will be implemented in a follow-up.