-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable support for decompression of compressed response within RestHighLevelClient #53533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable support for decompression of compressed response within RestHighLevelClient #53533
Conversation
Pulling latest changes from master
…e/add-support-for-decompression-of-compressed-response
|
The other possible way to implement could be at this method level at line number 272 of That would like the following code snippet: private ResponseOrResponseException convertResponse(InternalRequest request, Node node, HttpResponse httpResponse) throws IOException {
RequestLogger.logResponse(logger, request.httpRequest, node.getHost(), httpResponse);
int statusCode = httpResponse.getStatusLine().getStatusCode();
Response response = new Response(request.httpRequest.getRequestLine(), node.getHost(), decompressResponseEntityIfRequired(httpResponse));
if (isSuccessfulResponse(statusCode) || request.ignoreErrorCodes.contains(response.getStatusLine().getStatusCode())) {
onResponse(node);
if (request.warningsHandler.warningsShouldFailRequest(response.getWarnings())) {
throw new WarningFailureException(response);
}
return new ResponseOrResponseException(response);
}
ResponseException responseException = new ResponseException(response);
if (isRetryStatus(statusCode)) {
//mark host dead and retry against next one
onFailure(node);
return new ResponseOrResponseException(responseException);
}
//mark host alive and don't retry, as the error should be a request problem
onResponse(node);
throw responseException;
}
private HttpResponse decompressResponseEntityIfRequired(HttpResponse httpResponse) throws IOException {
for (Header header : httpResponse.getHeaders(HttpHeaders.CONTENT_ENCODING)) {
if ("gzip".equalsIgnoreCase(header.getValue())) {
String decompressedContent = decompressWithGzip(EntityUtils.toByteArray(httpResponse.getEntity()));
HttpEntity httpEntity = new NStringEntity(decompressedContent, ContentType.get(httpResponse.getEntity()));
httpResponse.setEntity(httpEntity);
httpResponse.removeHeader(header);
break;
}
}
return httpResponse;
}
private static String decompressWithGzip(byte[] compressedBytes) throws IOException {
try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(compressedBytes);
GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream, StandardCharsets.ISO_8859_1))) {
return bufferedReader.lines()
.collect(Collectors.joining("\n"));
}
}This code will decompress responses made from requests with the Rest High Level Client and even Low Level Client. The only downside of this alternative code (so not the one within the code changes but within this comment) is that it is not backwards compatible with existing consumer who have already have their own solution for decompression at the low-level-client. They will not be getting a compressed response anymore from the low-level-client as it has already be decompressed. The change within the rest-high-level client will probably be backwards compatible as decompression was never supported, so no consumer will be affected by this change. I like the change at RestClient class as it has less code duplicate and the low-level-client as well as the rest-high-level-client can benefit of this feature in one go, but I am afraid that users of the low-level-client will be affected. So it would be safe to go for the change within the RestHighLevelClient class. But I am curious what the reviewers are thinking :) |
|
Pinging @elastic/es-core-features (:Core/Features/Java High Level REST Client) |
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using Elasticsearch @Hakky54 and for putting together this PR. Great catch on the lack of support for decompressing the response entity.
We use the Apache HTTP client that already has support for decompressing the http entity using the GzipDecompressingEntity. I propose add support for decompressing the entity in the parseEntity method if the Content-Encoding header contains gzip using the apache
GzipDecompressingEntity to wrap the HttpEntity.
I also propose we enhance the HttpCompressionIT with an integration test for the high level rest client.
I believe the low level rest client should remain as it is for now, as that client puts the user in control with most things related to the HTTP interaction.
Thanks for working on this.
|
Hi @andreidan thank you for your feedback and I am glad that you appreciate my pull request ! It looks like my initial solution was too verbose, I didn't know of the existing of the GzipDecompressingEntity. I have refactored it and pushed it here. I also enhanced the integration test within the HttpCompressionIT. Please let me know what else could be improved :) |
…ompressed-response
…st of testParseEntity()
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this @Hakky54 This looks pretty good so far.
I've left a few (mostly minor) suggestions. Thanks for working on this!
client/rest-high-level/src/test/java/org/elasticsearch/client/RestHighLevelClientTests.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
Outdated
Show resolved
Hide resolved
qa/smoke-test-http/src/test/java/org/elasticsearch/http/HttpCompressionIT.java
Outdated
Show resolved
Hide resolved
client/rest-high-level/src/test/java/org/elasticsearch/client/RestHighLevelClientTests.java
Outdated
Show resolved
Hide resolved
client/rest-high-level/src/test/java/org/elasticsearch/client/RestHighLevelClientTests.java
Outdated
Show resolved
Hide resolved
client/rest-high-level/src/main/java/org/elasticsearch/client/RestHighLevelClient.java
Outdated
Show resolved
Hide resolved
|
@elasticmachine ok to test |
|
Thanks @andreidan It feels awesome to contribute back to the community and Elastic! |
|
Heya, I had left a comment on the original issue, see #53555 (comment) . We use the async http client rather than the http client, although the former depends on the latter. We are free to use classes from the blocking variant, but they are not always a good fit. The discussion in https://issues.apache.org/jira/browse/HTTPCLIENT-1822 seems to suggest that more work is required to implement automatic decompression using the async primitives that the async client offers, and highlights the steps needed, that I suppose are valid no matter if implemented within the async http client or as a user of it. I wonder if the implementation that we have introduced is equivalent, and why we have introduced it in our own client and not upstream for all the other users of async http client to benefit from. |
|
@javanna thanks for raising the apache issue to our attention, I was not aware of it. I agree this would be nice to be handled by the underlying client (and the appropriate place for this) transparently but until there's support in the We're handling the decompression when parsing the response, which will keep the networking async so I think this remains in line with the async client guarantees. Did we miss something in terms of being aligned with the async client invariants? |
Does this mean that we are going to contribute transparent content decompression back to apache http async client? I guess it is ok to decompress this way after all for now. I would open the discussion on making automatic content decompression available in the low-level client: it may be possible to use the same approach given the limited set of features it exposes from the underlying async client, or we may have to first contribute back a solution to the async http client library. I don't think users expect to have to decompress stuff themselves. And backwards compatibility should not prevent us from making this improvement. |
Backporting PR elastic#53533 to the fork. This can be dropped once we upgrade to 7.8.0 or later.
The way this is implemented at this stage, we can't port it back as it'll have to be a new implementation altogether as this implementation is ES HLRC specific. There isn't anyone in our team (afaik) actively pursuing this. With regards to using automatic decompression in the LLRC I think you're correct in what the options are. |
…ghLevelClient (elastic#53533) Added decompression of gzip when gzip value is return as an header from Elasticsearch (cherry picked from commit 4a195b5) Signed-off-by: Andrei Dan <[email protected]>
…ghLevelClient (#53533) (#54811) Added decompression of gzip when gzip value is return as an header from Elasticsearch (cherry picked from commit 4a195b5) Signed-off-by: Andrei Dan <[email protected]> Co-authored-by: Hakky54 <[email protected]>
|
Should I move the code changes to the LLRC or wait for now till this has been discussed within your team? |
|
@Hakky54 sorry for the late reply. We discussed this today and agreed having this supported in the LLRC would be very useful, so if you'd like to continue working on this that would be very much appreciated. |
|
Hi @andreidan that's awesome to hear! I couldn't push the changes to this pull request because I deleted the repository after the branch got merged. I applied the changes within a new feature branch and created a new pull request here: #55413 Could you have a look at it? Thanks @javanna for proposing to move this option to the llrc! |
|
Hi @andreidan, I see you removed label You know I would be totally fine if the methods on |
|
For anyone else interested in getting GZIP working without update to version 7+ you need to tweak a few places. The most important bit - response consumer, it should wrap /**
* Basically a copy of {@link HeapBufferedAsyncResponseConsumer} with a support of gzip encoding.
*
* Once ES provides out of the box support for compression this class should be dropped.
*/
class GzipSupportingResponseConsumer extends AbstractAsyncResponseConsumer<HttpResponse> {
private static final int DEFAULT_BUFFER_LENGTH = 4096;
private final int bufferLimitBytes;
private volatile HttpResponse response;
private volatile SimpleInputBuffer buf;
/**
* Creates a new instance of this consumer with the provided buffer limit
*/
GzipSupportingResponseConsumer(int bufferLimit) {
if (bufferLimit <= 0) {
throw new IllegalArgumentException("bufferLimit must be greater than 0");
}
this.bufferLimitBytes = bufferLimit;
}
/**
* Get the limit of the buffer.
*/
public int getBufferLimit() {
return bufferLimitBytes;
}
@Override
protected void onResponseReceived(HttpResponse response) {
this.response = response;
}
@Override
protected void onEntityEnclosed(HttpEntity entity, ContentType contentType) throws IOException {
long len = entity.getContentLength();
if (len > bufferLimitBytes) {
throw new ContentTooLongException("entity content is too long [" + len +
"] for the configured buffer limit [" + bufferLimitBytes + "]");
}
if (len < 0) {
len = DEFAULT_BUFFER_LENGTH;
}
this.buf = new SimpleInputBuffer((int) len, getByteBufferAllocator());
ContentBufferEntity cbEntity = new ContentBufferEntity(entity, this.buf);
this.response.setEntity(new GzipDecompressingEntity(cbEntity));
}
/**
* Returns the instance of {@link ByteBufferAllocator} to use for content buffering.
* Allows to plug in any {@link ByteBufferAllocator} implementation.
*/
protected ByteBufferAllocator getByteBufferAllocator() {
return HeapByteBufferAllocator.INSTANCE;
}
@Override
protected void onContentReceived(ContentDecoder decoder, IOControl ioctrl) throws IOException {
this.buf.consumeContent(decoder);
}
@Override
protected HttpResponse buildResult(HttpContext context) {
return response;
}
@Override
protected void releaseResources() {
response = null;
}
}Next, implement custom /**
* Consumer factory which provides {@link GzipSupportingResponseConsumer} instead of
* default {@link HeapBufferedResponseConsumerFactory} to enable GZIP support for ES calls.
*/
class CompressedHttpAsyncResponseConsumerFactory implements HttpAsyncResponseConsumerFactory {
//default buffer limit is 100MB
static final int DEFAULT_BUFFER_LIMIT = 100 * 1024 * 1024;
@Override
public HttpAsyncResponseConsumer<HttpResponse> createHttpAsyncResponseConsumer() {
return new GzipSupportingResponseConsumer(DEFAULT_BUFFER_LIMIT);
}
}Next, use custom made private RequestOptions createCustomRequestOptions() {
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
builder.setHttpAsyncResponseConsumerFactory(new CompressedHttpAsyncResponseConsumerFactory());
return builder.build();
}
private void whatever(SearchRequest searchRequest) {
SearchResponse response = client.search(searchRequest, createCustomRequestOptions()); // make sure you create options only once, there is no sense to recreate it on each call
}And now you can enable GZIP on HTTP level: RestClientBuilder builder = RestClient.builder(httpHost);
builder.setHttpClientConfigCallback(clientBuilder -> {
clientBuilder.addInterceptorFirst((HttpRequestInterceptor) (request, context) ->
request.setHeader("Accept-Encoding", "gzip"));
// Bonus time - compress all your request payloads, not only ES responses!
clientBuilder.addInterceptorLast((HttpRequestInterceptor) (request, context) -> {
if (request instanceof HttpEntityEnclosingRequest) {
HttpEntityEnclosingRequest requestWithPayload = (HttpEntityEnclosingRequest) request;
requestWithPayload.setEntity(new GzipCompressingEntity(requestWithPayload.getEntity()));
}
});
return clientBuilder;
}); |
…ghLevelClient (elastic#53533) Added decompression of gzip when gzip value is return as an header from Elasticsearch
|
Awesome news, thanks @swallez! Once it's released I can really simplify my configuration code. Also, any plans to use my approach for compressing request payloads for #62044 (and backport it to 6.8) or do you need help in contributing to that one? |
|
@rand0m86 we are targeting a maintenance release for 6.8 in the near future, likely the end of October. So the wait shouldn't be very long. About request payloads we're considering adding something like |
This pull request is to enable the RestHighLevelClient to handle an Elasticsearch response which has compressed content. It was already possible to do this with the low level client, see here AsyncElasticsearchCompressedRequest
Compression can enabled within a node configuration with the following property:
http.compression: trueCompression can be triggered by a request from a client. Therefor you also need to provide additional information within the header of the request to Elasticsearch if a client really wants to enable it. That is possible with the following RequestOptions:With these two properties Elasticsearch will return a gzip compressed response body. The RestHighLevelClient can send a request with the above request options but it couldn't handle the response yet. This feature request contains the option to handle a compressed response if it is compressed with gzip.