Skip to content

Conversation

@PnPie
Copy link
Contributor

@PnPie PnPie commented Mar 26, 2018

Currently when we parse BulkItemResponse to XContent and then parsed it back, the original Exception type has not been kept and always transfered to ElasticsearchException.
But seems this doesn't work very well in case of EsRejectedExecutionException, because the BulkRequestHandler's retry logic relies on EsRejectedExecutionException(only retry on this type of Exception). So if it was parsed back to an ElasticsearchException, the bulk request through Rest high level client cannot be retried, leads to issues like #28885.

This change makes EsRejectedExecutionException can be parsed to and from XContent. I'm not really sure if it is the right way to solve the problem but I just open this PR to see and discuss.

Relates to #28885

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@jdewald
Copy link

jdewald commented Mar 26, 2018

@PnPie In my own tests of a very proof-of-concept version of this same change, I found that deadlocks would occur in the normal retry path, which seems to be due to a difference in how the threading is handled in the TransportClient version versus High-Level Rest. It's possible this was due to particularly aggressive bulk thread limits that I used to exercise the issue.

Have you found that you can continue to submit new documents to the processor even as retries are going through? The workaround I am currently using is to maintain a separate retry queue which sends the documents back through the processor rather than being able to retry from within the processor. Being able to use the standard Retry logic of the stock listener would be quite nice but in practice I haven't had much success.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@javanna javanna added the >bug label Mar 27, 2018
@javanna
Copy link
Member

javanna commented Mar 27, 2018

hi @PnPie thanks for opening this PR. I am not convinced that making a special case for EsRejectedExecutionException is the right choice. Either we are able to parse all the exceptions back into their own type, or we just don't and we document it. I think I would rather find a way to adapt the bulk processor to not only rely on the exception type. It could for instance look at the response code as well, rejections are easily detected. @tlrx what do you think?

@tlrx
Copy link
Member

tlrx commented Mar 28, 2018

👍 for @javanna's suggestion

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @PnPie thanks for your PR! I think that we should do this differently, I left a comment, let me know what you think and if you are up for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of parsing back this exception into its own type, which we don't do for a lot of other exceptions, why don't we adapt the BulkProcessor to retry based on the status returned with the exception? That should work for both transport client and REST client I think. I think that configuring the exception to be retried on is not necessary, as we only ever use it for rejections (or we could make the status configurable instead of the class), we should check if the root cause is an ElasticsearchException , if so cast and check the returned status.

This needs testing, for instance porting the existing BulkProcessorRetryIT to the rest-high-level tests and adapting it similar to what I am doing in #29263 for BulkProcessorIT.

One other thing that I noticed is that when canRetry returns true, we will retry all failed items from that response, but we don't check again the status code, nor the exception type. As a follow-up, we may want to fix that in createBulkRequestForRetry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @javanna, thx for having looked at it and the comments. I'm definitely agree with that. I also saw that the status was processed correctly (RestStatus.TOO_MANY_REQUESTS) as it is derived from Exception but parsed seperately, but I was just not sure it is preferred to change the bulk side directly or the rest hight level side when I was doing this. So I'll change it in this way soon.

@PnPie PnPie force-pushed the rest_highlevel_bulk_retry branch from cff66bc to 168444b Compare April 1, 2018 21:38
@karmi
Copy link
Contributor

karmi commented Apr 1, 2018

Hi @PnPie, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@PnPie PnPie force-pushed the rest_highlevel_bulk_retry branch from 168444b to 15b5172 Compare April 1, 2018 21:42
@PnPie PnPie closed this Apr 1, 2018
@PnPie PnPie force-pushed the rest_highlevel_bulk_retry branch from 15b5172 to e70cd35 Compare April 1, 2018 21:50
@PnPie
Copy link
Contributor Author

PnPie commented Apr 1, 2018

Hello @javanna and everyone,
I'm so so sorry I just mistakenly pushed with another account. But when I was trying redo it once again I just get it closed by changing it back ... So I created a new PR #29329 for this. I'm sorry again for all the inconvenience.

@javanna
Copy link
Member

javanna commented Apr 3, 2018

don't worry @PnPie no problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants