-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable BulkItemResponse to parse back EsRejectedExecutionException through XContent #29254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
|
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
|
@PnPie In my own tests of a very proof-of-concept version of this same change, I found that deadlocks would occur in the normal retry path, which seems to be due to a difference in how the threading is handled in the TransportClient version versus High-Level Rest. It's possible this was due to particularly aggressive bulk thread limits that I used to exercise the issue. Have you found that you can continue to submit new documents to the processor even as retries are going through? The workaround I am currently using is to maintain a separate retry queue which sends the documents back through the processor rather than being able to retry from within the processor. Being able to use the standard Retry logic of the stock listener would be quite nice but in practice I haven't had much success. |
|
Pinging @elastic/es-core-infra |
|
hi @PnPie thanks for opening this PR. I am not convinced that making a special case for |
|
👍 for @javanna's suggestion |
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @PnPie thanks for your PR! I think that we should do this differently, I left a comment, let me know what you think and if you are up for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of parsing back this exception into its own type, which we don't do for a lot of other exceptions, why don't we adapt the BulkProcessor to retry based on the status returned with the exception? That should work for both transport client and REST client I think. I think that configuring the exception to be retried on is not necessary, as we only ever use it for rejections (or we could make the status configurable instead of the class), we should check if the root cause is an ElasticsearchException , if so cast and check the returned status.
This needs testing, for instance porting the existing BulkProcessorRetryIT to the rest-high-level tests and adapting it similar to what I am doing in #29263 for BulkProcessorIT.
One other thing that I noticed is that when canRetry returns true, we will retry all failed items from that response, but we don't check again the status code, nor the exception type. As a follow-up, we may want to fix that in createBulkRequestForRetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @javanna, thx for having looked at it and the comments. I'm definitely agree with that. I also saw that the status was processed correctly (RestStatus.TOO_MANY_REQUESTS) as it is derived from Exception but parsed seperately, but I was just not sure it is preferred to change the bulk side directly or the rest hight level side when I was doing this. So I'll change it in this way soon.
cff66bc to
168444b
Compare
168444b to
15b5172
Compare
15b5172 to
e70cd35
Compare
|
don't worry @PnPie no problem |
Currently when we parse BulkItemResponse to XContent and then parsed it back, the original Exception type has not been kept and always transfered to
ElasticsearchException.But seems this doesn't work very well in case of
EsRejectedExecutionException, because theBulkRequestHandler's retry logic relies onEsRejectedExecutionException(only retry on this type of Exception). So if it was parsed back to anElasticsearchException, the bulk request through Rest high level client cannot be retried, leads to issues like #28885.This change makes
EsRejectedExecutionExceptioncan be parsed to and from XContent. I'm not really sure if it is the right way to solve the problem but I just open this PR to see and discuss.Relates to #28885