Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 4, 2021

This change targets the feature branch: group-field-caps (based on 7.x).

This adds a retry mechanism for node-based field caps requests introduced in #77047. Merging index responses on data nodes will be implemented in a follow-up.

@dnhatn dnhatn force-pushed the 7x-group-field-caps branch from 3b58fc7 to 3804c00 Compare October 5, 2021 01:52
@dnhatn dnhatn marked this pull request as ready for review October 5, 2021 02:28
@dnhatn dnhatn added the :Search/Search Search-related issues that do not fall into other categories label Oct 5, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dnhatn dnhatn added >feature and removed Team:Search Meta label for search team labels Oct 5, 2021
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat. I've done just a quick pass today (didn't get further). I'm wondering if some of the retry logic around shard selection / grouping can be unit-tested (e.g. we currently test that retries ARE happening, but don't test how many etc).

@dnhatn
Copy link
Member Author

dnhatn commented Oct 6, 2021

@ywelsch Thank you for your review. All good points - I am addressing them.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 7, 2021

@ywelsch I think I have addressed your feedback. I will add some more unit tests to RequestDispatcher and IT tests. Would you mind taking another look?

I think we can use the existing FieldCapabilitiesRequest instead of introducing FieldCapabilitiesNodeRequest when the merging response is implemented. I will consider this in the merge response PR.

@dnhatn dnhatn requested a review from ywelsch October 7, 2021 03:29
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this @dnhatn. I like that we have a dedicated class now to handle the request dispatching logic. The test coverage also looks great.

The one part I wasn't sure of was the synchronization strategy in RequestDispatcher. There is quite a bit of logic guarded under synchronized blocks, especially the one in execute. I wonder if it'd be better (and if it's even possible) to rely on atomic integers/ thread-safe collections for this? I haven't identified a concrete concern, just raising it to hear your thoughts.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 11, 2021

@ywelsch Please hold off on the review. I am working on merging the responses, and I will integrate it in this PR.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 11, 2021

I wonder if it'd be better (and if it's even possible) to rely on atomic integers/ thread-safe collections for this? I haven't identified a concrete concern, just raising it to hear your thoughts

@jtibshirani Yes, we can go without synchronization.

Copy link
Member Author

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch I've updated this PR. The RequestDispatcher and its tests are ready. The merging logic is still WIP. I need to discuss it with you before completing it. Would you please review the RequestDispatcher and the approach of the merging results logic? I will take a look at your can_match PR tomorrow. Sorry for the delay - I've been focusing on this PR. Thank you!

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on merging the responses, and I will integrate it in this PR.

Let's revert that part. It has become too difficult to review this PR, and I think we will need more discussions on the merging logic. Let's not block the node-level action on this, but create a clear list of follow-ups.

// and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid
// sending multiple requests.
final DiscoveryNode discoNode = discoveryNodes.get(node.getKey());
if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unfortunate that the BWC logic is spread to both here and the sendRequestToNode method. Can we avoid this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I couldn't find a clean way. Any suggestion is welcome :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a better suggestion, unfortunately, so let' leave as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this optimization to be consistent with 8.0. However, I can make this change if you and Yannick have a strong opinion on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly, happy to go with what you (and @ywelsch) prefer here.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 13, 2021

@ywelsch @jtibshirani Thanks for reviews. This is ready again after I removed the merging logic. Would you mind taking another look?

@dnhatn dnhatn requested review from jtibshirani and ywelsch October 13, 2021 21:07
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me too, I just left some small comments.

// and the target node will process at most one valid copy. Otherwise, we should ask for a single copy to avoid
// sending multiple requests.
final DiscoveryNode discoNode = discoveryNodes.get(node.getKey());
if (discoNode.getVersion().onOrAfter(GROUP_REQUESTS_VERSION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just remove this optimization for simplicity? Given there is no index filter, in the happy case we will only have to consult one shard copy.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🎉

@dnhatn
Copy link
Member Author

dnhatn commented Oct 14, 2021

@ywelsch @jtibshirani Thanks so much for your reviews.

@dnhatn dnhatn merged commit 6f31965 into elastic:group-field-caps Oct 14, 2021
@dnhatn dnhatn deleted the 7x-group-field-caps branch October 14, 2021 22:41
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Oct 15, 2021
This adds a retry mechanism for node level field caps requests 
introduced in elastic#77047.
dnhatn added a commit that referenced this pull request Oct 15, 2021
Currently to gather field caps, the coordinator sends a separate transport
request per index. When the original request targets many indices, the overhead
of all these sub-requests can add up and hurt performance. This PR switches the
execution strategy to reduce the number of transport requests: it groups
together the index requests that target the same node, then sends only one
request to each node.

Relates  #77047
Relates # #78647


Co-authored-by: Julie Tibshirani <[email protected]>
dnhatn added a commit that referenced this pull request Oct 15, 2021
Currently to gather field caps, the coordinator sends a separate transport
request per index. When the original request targets many indices, the overhead
of all these sub-requests can add up and hurt performance. This PR switches the
execution strategy to reduce the number of transport requests: it groups
together the index requests that target the same node, then sends only one
request to each node.

Relates  #77047
Relates # #78647

Co-authored-by: Julie Tibshirani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search/Search Search-related issues that do not fall into other categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants