Skip to content

Conversation

@steveloughran
Copy link
Contributor

This is the successor to #2179

  1. ABFS Store creates a single threadpool, configurable with fixed size or multiple of cores
  2. each output stream is given its own semaphored pool which limits the access that stream has to the pool

To actually defend against OOMs the per-stream queue length is what needs to be managed; looking at the patch it still has the problem of #2179: you need one buffer per pending upload in the the pools.

Ultimately the S3A Connector fixed this by going to disk buffering by default. A more performant design might be to have a blocking byte buffer factory which limits the #of buffers which the streams can request, so putting an upper bound on the amount of memory which a single ABFS store instance can demand.

Change-Id: I6915539cfafe7164c404dfc153653710280d9bf6
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 34s trunk passed
+1 💚 compile 0m 49s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 shadedclient 19m 51s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 13s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 9s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 31s the patch passed
-0 ⚠️ checkstyle 0m 19s hadoop-tools/hadoop-azure: The patch generated 7 new + 2 unchanged - 0 fixed = 9 total (was 2)
+1 💚 mvnsite 0m 34s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 18m 28s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 15s the patch passed
_ Other Tests _
+1 💚 unit 1m 38s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
90m 15s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2294/1/artifact/out/Dockerfile
GITHUB PR #2294
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 70cc2f756b0c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e5fe326
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2294/1/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2294/1/testReport/
Max. process+thread count 309 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2294/1/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran marked this pull request as draft September 10, 2020 11:19
@steveloughran steveloughran added fs/azure changes related to azure; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome labels Sep 10, 2020
@steveloughran
Copy link
Contributor Author

Looking at this a bit more

  • its the use of buffer which causes the OOM not the thread pooling, so neither this nor its predecessor patch will directly fix that
  • need to support a bytebuffer pool with max capacity and/or disk buffering

@steveloughran
Copy link
Contributor Author

Closing this, but leaving up as the PoC to say "we should have a shared thread pool for lower startup costs"; it would be a switch to buffering on which will the way to guarantee an end to OOM problems

I am happy for the S3A blocks class to be moved to hadoop-common to address this.

@steveloughran
Copy link
Contributor Author

Maybe I was being pessimistic there. If the #of active writes a single stream can have active is throttled, the #of open blocks a single stream can have allocated is also blocked. But: ability to buffer on disk is the way to robustly avoid scale issues with many active threads.

@steveloughran steveloughran deleted the abfs/HADOOP-17195-threadpool branch October 15, 2021 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs/azure changes related to azure; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants