Skip to content

Conversation

@wujinhu
Copy link
Contributor

@wujinhu wujinhu commented Jul 8, 2019

Hi, es team. We are using repository-s3 to access s3-compatible service(Alibaba Cloud Object Storage Service). However, it does not support chunked encoding.

[root@master ~]# curl -XPUT 'http://localhost:9200/_snapshot/backup' -H 'Content-Type: application/json' -d '{ "type": "s3", "settings": { "bucket": "hadoop-oss-test", "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "protocol": "http"} }'
{
	"error": {
		"root_cause": [{
			"type": "repository_verification_exception",
			"reason": "[backup] path  is not accessible on master node"
		}],
		"type": "repository_verification_exception",
		"reason": "[backup] path  is not accessible on master node",
		"caused_by": {
			"type": "i_o_exception",
			"reason": "Unable to upload object [tests-jI3CAPLSTeWReGDUP_Pf_w/master.dat] using a single upload",
			"caused_by": {
				"type": "amazon_s3_exception",
				"reason": "Aws MultiChunkedEncoding is not supported. (Service: Amazon S3; Status Code: 400; Error Code: NotImplemented; Request ID: 5D22E6C1F73A3FB709ECAB2F; S3 Extended Request ID: hadoop-oss-test.oss-cn-zhangjiakou.aliyuncs.com)"
			}
		}
	},
	"status": 500
}

After we add chunked encoding as a configuration(disable_chunked_encoding), it works!

curl -XPUT 'http://localhost:9200/_snapshot/backup' -H 'Content-Type: application/json' -d '{ "type": "s3", "settings": { "bucket": "hadoop-oss-test", "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "protocol": "http", "disable_chunked_encoding": true} }'
{"acknowledged":true}

I created this PR for this change. :)

@martijnvg martijnvg added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Jul 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@martijnvg
Copy link
Member

@elasticmachine test this please

@martijnvg
Copy link
Member

@wujinhu This looks like a great change, thanks for contributing this!

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 8, 2019

Thanks @martijnvg , very glad to contribute to es. :)

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 8, 2019

Two tests failed because of time out.

Error Message
java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
Stacktrace
java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
	at __randomizedtesting.SeedInfo.seed([920413C7CF5E9D9C]:0)
Error Message
java.lang.Exception: Test abandoned because suite timeout was reached.
Stacktrace
java.lang.Exception: Test abandoned because suite timeout was reached.
	at __randomizedtesting.SeedInfo.seed([920413C7CF5E9D9C]:0)

They are ok if I run REPRODUCE commands

wujinhudeMacBook-Pro:elasticsearch wujinhu$ ./gradlew :x-pack:plugin:integTestRunner --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/datafeeds_crud/Test get datafeed with expression that does not match and allow_no_datafeeds}" -Dtests.seed=920413C7CF5E9D9C -Dtests.security.manager=true -Dtests.locale=fr-BJ -Dtests.timezone=Asia/Kuala_Lumpur -Dcompiler.java=12 -Druntime.java=11 -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*

> Configure project :plugins:repository-azure:qa:microsoft-azure-storage
Using access key in external service tests.

> Task :printGlobalBuildInfo UP-TO-DATE
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.5
  OS Info               : Mac OS X 10.14.3 (x86_64)
  Compiler JDK Version  : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  Compiler java.home    : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Runtime JDK Version   : 11 (Oracle Corporation 11.0.2 [OpenJDK 64-Bit Server VM 11.0.2+9])
  Runtime java.home     : /Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home
  Gradle JDK Version    : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  Gradle java.home      : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Random Testing Seed   : 920413C7CF5E9D9C
=======================================
<=<============-> 92% EXECUTING [1m 54s]

BUILD SUCCESSFUL in 2m 36s
275 actionable tasks: 107 executed, 168 up-to-date
wujinhudeMacBook-Pro:elasticsearch wujinhu$ ./gradlew :x-pack:plugin:integTestRunner --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/datafeeds_crud/Test get datafeed with expression that does not match and allow_no_datafeeds}" -Dtests.seed=920413C7CF5E9D9C -Dtests.security.manager=true -Dtests.locale=fr-BJ -Dtests.timezone=Asia/Kuala_Lumpur -Dcompiler.java=12 -Druntime.java=11 -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*

> Configure project :plugins:repository-azure:qa:microsoft-azure-storage
Using access key in external service tests.

> Task :printGlobalBuildInfo UP-TO-DATE
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.5
  OS Info               : Mac OS X 10.14.3 (x86_64)
  Compiler JDK Version  : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  Compiler java.home    : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Runtime JDK Version   : 11 (Oracle Corporation 11.0.2 [OpenJDK 64-Bit Server VM 11.0.2+9])
  Runtime java.home     : /Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home
  Gradle JDK Version    : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  Gradle java.home      : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Random Testing Seed   : 920413C7CF5E9D9C
=======================================

BUILD SUCCESSFUL in 14s
275 actionable tasks: 4 executed, 271 up-to-date

@martijnvg
Copy link
Member

@wujinhu That failure looks unrelated to your change. I will trigger another build.

@martijnvg
Copy link
Member

@elasticmachine run elasticsearch-ci/2

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wujinhu for the PR. I left a few minor comments. I also think we should randomly set this setting in the integration tests, and adjust AmazonS3Fixture to handle non-chunked uploads when the setting is set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference for this, avoiding the boolean parameter:

Suggested change
builder.withChunkedEncodingDisabled(true);
builder.disableChunkedEncoding();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace nit:

Suggested change

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 8, 2019

Thanks @DaveCTurner for your suggestions, I have optimized the code and documentation.
I agree with you to adjust AmazonS3Fixture to handle non-chunked uploads when the setting is set. I am working on this.

Thanks @wujinhu for the PR. I left a few minor comments. I also think we should randomly set this setting in the integration tests, and adjust AmazonS3Fixture to handle non-chunked uploads when the setting is set.

@DaveCTurner
Copy link
Contributor

Before you go too much further @wujinhu, we have some concerns about this PR. On closer inspection it seems that Alibaba Object Storage Service does support chunked uploads: it's mentioned explicitly in their docs at https://www.alibabacloud.com/blog/how-to-use-node-js-to-upload-files-to-alibaba-cloud-object-storage_594077 and https://www.alibabacloud.com/help/doc-detail/31978.htm at least. Arguably any service that doesn't support chunked uploads isn't really S3-compatible. I think you'll need to seek support from Alibaba to determine why it's not working for you.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 8, 2019

Thanks @DaveCTurner . Before I created this pr, I have talked with them, this feature is not enabled for all regions. And it will take some time to enable remaining regions in their plan. So, they suggest us to disable chunked encoding this time.

Let me explain this problem in detail. If we use https, aws sdk will use single chunked encoding, it's OK. However, if we use http, aws sdk will use multiple chunked encoding, Alibaba Object Storage Service does support it(just as error message said), and it will take some time to enable it in their plan. So, we add this configuration to disable it.

@DaveCTurner
Copy link
Contributor

Ok, to be clear, the absolute earliest this change could possibly be delivered is 7.4.0, and we have only just released 7.2.0 so it will be quite some time yet before this could be in a released version. Can you link to some docs showing which regions do and don't support chunked content-encoding? The docs do not necessarily need to be in English if that helps. Do you know how long it will be before all regions support it? It might be simpler to wait.

We also have a concern about whether there is a difference in memory usage of snapshots when using this setting. We suspect it isn't significant, but this will need to be tested carefully.

Apart from those two points, I think it is worth proceeding with this PR.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 9, 2019

@DaveCTurner Thanks for your support. I have talked with their support. They plan to support multiple chunked encoding in India/Japan/Singapore/Malaysia/Indonesia/Dubai regions this year, and support remaining regions next year(not sure).

There is another advantage to set this setting. Just as aws sdk documentation says, it will has performance implications if we enable chunked encoding.

public Subclass disableChunkedEncoding()
Disables chunked encoding on clients built via the builder.

The default behavior is to enable chunked encoding automatically for PutObjectRequest and UploadPartRequest. Setting this flag will result in disabling chunked encoding for all requests.

Note: Enabling this option has performance implications since the checksum for the payload will have to be pre-calculated before sending the data. If your payload is large this will affect the overall time required to upload an object. Using this option is recommended only if your endpoint does not implement chunked uploading.

Returns:
this Builder instance that can be used for method chaining

I will proceed with this PR if you have no concerns. :)

@DaveCTurner
Copy link
Contributor

There is another advantage to set this setting.

I think you mean disadvantage. Chunked encoding allows the SDK to upload the file having read it once, but if chunked encoding is disabled then the file must be read twice, which is slower. It seems that Alibaba users like yourself have no choice, however, and will not have a choice for some time. I will suggest some wording for the docs to explain this more clearly, but please go ahead with the changes to the tests and AmazonS3Fixture we discussed above.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 11, 2019

@DaveCTurner I have spent some time running the integration tests and submitted this change. Please tell me if I misunderstood your idea.:)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, great, that's pretty much exactly what I meant. I asked for a couple extra checks - see inline comments.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 11, 2019

@DaveCTurner Please take a look at again. I add extra checks as you suggested.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I suggested a couple of minor changes but this all looks ok. However I find that ./gradlew :plugins:repository-s3:integTest sometimes fails for me, seemingly because chunked encoding is supposed to be disabled but some requests are still coming in using chunked encoding. I'm not sure why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this to be deterministic. You can use new Random(Long.parseUnsignedLong(project.rootProject.testSeed.tokenize(':').get(0), 16)) to construct a seeded Random.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit:

Suggested change
if (!disableChunkedEncoding) {
if (disableChunkedEncoding == false) {

Experience has shown that a vital ! is too easy to miss, so we prefer == false instead.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 12, 2019

@DaveCTurner Maybe I found the reason why tests fail randomly. if I run command below, it's ok.

for((i=1;i<=20;++i)); do rm -f plugins/repository-s3/build/fixtures/s3Fixture.properties && ./gradlew :plugins:repository-s3:integTest; done

It seems this issue relates to updates of s3Fixture.properties. I will confirm it and update later.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 12, 2019

I found updates of s3Fixture.properties was after task s3Fixture.

 // If all these variables are missing then we are testing against the internal fixture instead, which has the following
 // credentials hard-coded in.
@@ -244,6 +244,7 @@ task s3FixtureProperties {

   doLast {
     file(s3FixtureFile).text = s3FixtureOptions.collect { k, v -> "$k = $v" }.join("\n")
+    println 'doLast in task s3FixtureProperties'
   }
 }

@@ -251,6 +252,7 @@ task s3FixtureProperties {
 task s3Fixture(type: AntFixture) {
   dependsOn testClasses
   dependsOn s3FixtureProperties
+  println 'in task s3Fixture'
   inputs.file(s3FixtureFile)

Result:

wujinhudeMacBook-Pro:elasticsearch wujinhu$ ./gradlew :plugins:repository-s3:integTest
Starting a Gradle Daemon, 2 busy and 10 incompatible and 5 stopped Daemons could not be reused, use --status for details

> Configure project :plugins:repository-s3
in task s3Fixture

> Configure project :plugins:repository-azure:qa:microsoft-azure-storage
Using access key in external service tests.

> Task :printGlobalBuildInfo UP-TO-DATE
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.5
  OS Info               : Mac OS X 10.14.3 (x86_64)
  JDK Version           : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  JAVA_HOME             : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Random Testing Seed   : 42E1CC61CF88F195
=======================================

> Task :plugins:repository-s3:composeUp
Building minio-fixture
f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 is up-to-date
<=Will use localhost as host of minio-fixture
minio-fixture_1 health state reported as 'healthy' - continuing...
Probing TCP socket on localhost:32884 of service 'minio-fixture_1'
TCP socket on localhost:32884 of service 'minio-fixture_1' is ready

> Task :plugins:repository-s3:s3FixtureProperties
doLast in task s3FixtureProperties
<============-> 98% EXECUTING [1m 19s]
> :plugins:repository-s3:integTestRunner > 3 tests completed, 1 skipped
> :plugins:repository-s3:integTestRunner > Executing test org.elasticsearch...s3.RepositoryS3ClientYamlTestSuiteIT

So, if we move updates of s3Fixture.properties outside of doLast block, it will be ok.

@@ -242,9 +242,7 @@ task s3FixtureProperties {
       "s3Fixture.disableChunkedEncoding" : s3DisableChunkedEncoding
   ]

-  doLast {
-    file(s3FixtureFile).text = s3FixtureOptions.collect { k, v -> "$k = $v" }.join("\n")
-  }
+  file(s3FixtureFile).text = s3FixtureOptions.collect { k, v -> "$k = $v" }.join("\n")
 }

@DaveCTurner
Copy link
Contributor

Yes that seems to have fixed it. This LGTM but I would like @original-brownbear to pass judgement too (particularly on the changes to gradle.build).

@DaveCTurner DaveCTurner dismissed their stale review July 12, 2019 11:30

Feedback all addressed

@original-brownbear
Copy link
Contributor

@wujinhu change is merged, can you merge master into this branch and revert your change to the Gradle outputs here?

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 17, 2019

@original-brownbear ok, I‘m testing now.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 18, 2019

@original-brownbear it seems s3Fixture.properties is not updated after I rebased your change.

wujinhudeMacBook-Pro:elasticsearch wujinhu$ ls -lh plugins/repository-s3/build/fixtures/ && ./gradlew :plugins:repository-s3:integTest && ls -lh plugins/repository-s3/build/fixtures/
total 8
drwxr-xr-x  4 wujinhu  staff   128B  7 18 10:18 s3Fixture
-rw-r--r--  1 wujinhu  staff   483B  7 18 10:17 s3Fixture.properties

> Configure project :plugins:repository-azure:qa:microsoft-azure-storage
Using access key in external service tests.

> Task :printGlobalBuildInfo UP-TO-DATE
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.5
  OS Info               : Mac OS X 10.14.3 (x86_64)
  JDK Version           : 12 (Oracle Corporation 12.0.1 [OpenJDK 64-Bit Server VM 12.0.1+12])
  JAVA_HOME             : /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home
  Random Testing Seed   : D223CAF414F10F7E
=======================================

> Task :plugins:repository-s3:composeUp
Building minio-fixture
Creating network "f1036f6c7931cf652b3bc1a662612241_repository-s3__default" with the default driver
Creating f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ...
Creating f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ... done
Will use localhost as host of minio-fixture
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
Waiting for minio-fixture_1 to become healthy (it's starting)
minio-fixture_1 health state reported as 'healthy' - continuing...
Probing TCP socket on localhost:32985 of service 'minio-fixture_1'
TCP socket on localhost:32985 of service 'minio-fixture_1' is ready
<=
> Task :plugins:repository-s3:composeDown
Stopping f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ...
Stopping f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ... done
Removing f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ...
Removing f1036f6c7931cf652b3bc1a662612241_repository-s3__minio-fixture_1 ... done
Removing network f1036f6c7931cf652b3bc1a662612241_repository-s3__default
<============-> 99% EXECUTING [1m 26s]

BUILD SUCCESSFUL in 1m 27s
157 actionable tasks: 9 executed, 148 up-to-date
total 8
drwxr-xr-x  4 wujinhu  staff   128B  7 18 10:22 s3Fixture
-rw-r--r--  1 wujinhu  staff   483B  7 18 10:17 s3Fixture.properties

@original-brownbear
Copy link
Contributor

@wujinhu yes that's right :) The file was a hack to pass the address the Minio Docker container was bound to, to the JUnit tests. Now that hack and all the problems it was causing is gone and it seems like it successfully executed for you now as well?
=> Could you please merge master into this branch here so we can run CI so we get this merged.

Thanks again for your help on this one!

@wujinhu wujinhu force-pushed the chunked-encoding branch from 18b30bc to 7558195 Compare July 18, 2019 06:59
@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 18, 2019

@original-brownbear Sorry, I am a little confusion about your meaning, do you mean I need revert this hack or not?

  + outputs.upToDateWhen { false }

I have rebased master.

@original-brownbear
Copy link
Contributor

@wujinhu yes, please revert the hack it shouldn't be necessary any longer.

@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 18, 2019

@original-brownbear ok, please trigger the test, thanks.

@original-brownbear
Copy link
Contributor

Jenkins test this

thanks @wujinhu !

@original-brownbear original-brownbear self-requested a review July 18, 2019 08:02
Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @wujinhu

Want to take another look as well @DaveCTurner ?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @wujinhu in future please don't rebase (or force-push to) PR branches, because it loses history and comments and things. git merge master is friendlier to reviewers.

@original-brownbear original-brownbear merged commit 6d70276 into elastic:master Jul 18, 2019
original-brownbear pushed a commit to original-brownbear/elasticsearch that referenced this pull request Jul 18, 2019
* Add disable_chunked_encoding setting to S3 repo plugin to support S3 implementations that don't support chunked encoding
original-brownbear added a commit that referenced this pull request Jul 18, 2019
* Add disable_chunked_encoding setting to S3 repo plugin to support S3 implementations that don't support chunked encoding
@wujinhu
Copy link
Contributor Author

wujinhu commented Jul 18, 2019

@DaveCTurner @original-brownbear ok, thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants