-
Notifications
You must be signed in to change notification settings - Fork 935
Description
Describe the bug
When uploading a directory to S3 using S3TransferManager.uploadDirectory() that contains hundreds of thousands of files, then it fails with OutOfMemoryError.
Expected Behavior
S3TransferManager can work fine no matter how many files are in the directory tree.
Current Behavior
xception in thread "Thread-11" Exception in thread "Thread-24" Exception in thread "Thread-65" Exception in thread "sdk-async-response-0-24" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Reproduction Steps
package org.example;
import org.apache.commons.lang3.RandomStringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.internal.crt.S3CrtAsyncClient;
import software.amazon.awssdk.services.s3.model.CreateBucketRequest;
import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
import software.amazon.awssdk.services.s3.model.NoSuchBucketException;
import software.amazon.awssdk.transfer.s3.S3TransferManager;
import software.amazon.awssdk.transfer.s3.model.CompletedDirectoryUpload;
import software.amazon.awssdk.transfer.s3.model.UploadDirectoryRequest;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Path;
public class Main {
private static Path parentDir;
private static final String LOCALSTACK_ENDPOINT = "http://localhost:4566";
private static final Region LOCALSTACK_REGION = Region.EU_CENTRAL_1;
private static final String BUCKET = "test-bucket";
private static final int FILE_COUNT = 2000000;
private static final Logger LOG = LoggerFactory.getLogger(Main.class);
private static S3Client s3Client;
private static S3TransferManager s3TransferManager;
private static void setupAws() {
S3AsyncClient s3AsyncClient = S3CrtAsyncClient.builder()
.endpointOverride(URI.create(LOCALSTACK_ENDPOINT))
.region(LOCALSTACK_REGION)
.credentialsProvider(ProfileCredentialsProvider.create())
.forcePathStyle(true)
.targetThroughputInGbps(10d)
.maxNativeMemoryLimitInBytes((long) 1024 * 1024 * 1024)
.maxConcurrency(10)
.build();
s3TransferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build();
s3Client = S3Client.builder()
.endpointOverride(URI.create(LOCALSTACK_ENDPOINT))
.forcePathStyle(true)
.region(LOCALSTACK_REGION)
.credentialsProvider(ProfileCredentialsProvider.create())
.build();
}
private static boolean bucketExists(String bucket) {
HeadBucketRequest headBucketRequest = HeadBucketRequest.builder()
.bucket(bucket)
.build();
try {
s3Client.headBucket(headBucketRequest);
return true;
} catch (NoSuchBucketException e) {
return false;
}
}
private static void setupBucket() {
if (!bucketExists(Main.BUCKET)) {
s3Client.createBucket(CreateBucketRequest.builder().bucket(Main.BUCKET).build());
}
}
private static void setupFiles() throws IOException {
parentDir = Files.createTempDirectory("aws-s3");
for (int i = 0; i < Main.FILE_COUNT; i++) {
File file = new File(parentDir.toString() + File.separator + "random" + i + ".txt");
try {
FileWriter writesToFile = new FileWriter(file);
BufferedWriter writer = new BufferedWriter(writesToFile);
writer.write(RandomStringUtils.random(256, true, true) + "\n");
writer.close();
} catch (IOException e) {
System.exit(0);
}
}
}
private static void uploadFiles(String bucket, String prefix, Path sourceDir) {
CompletedDirectoryUpload completedDirectoryUpload =
s3TransferManager
.uploadDirectory(
UploadDirectoryRequest.builder()
.bucket(bucket)
.s3Prefix(prefix)
.source(sourceDir)
.build())
.completionFuture()
.join();
if (!completedDirectoryUpload.failedTransfers().isEmpty()) {
Throwable exception = completedDirectoryUpload.failedTransfers().get(0).exception();
String msg = String.format("Some files failed ({%d}x) to upload with directory upload. One exception was: {%s}",
completedDirectoryUpload.failedTransfers().size(), exception.getMessage());
throw new RuntimeException(msg, exception);
}
}
public static void main(String[] args) throws IOException {
setupAws();
LOG.info("AWS initialized");
setupBucket();
LOG.info("Bucket created: {}", BUCKET);
setupFiles();
LOG.info("Files created: {}", FILE_COUNT);
uploadFiles(BUCKET, "", parentDir);
LOG.info("Files uploaded: {}", FILE_COUNT);
}
}
Possible Solution
Do not collect a List of CompletableFutures, because this will result in a huge in-memory collection, in case the directory tree has millions of files. In our case we have a flat directory tree (only one directory with 2mio small XML files).
private void doUploadDirectory(CompletableFuture<CompletedDirectoryUpload> returnFuture,
UploadDirectoryRequest uploadDirectoryRequest) {
Path directory = uploadDirectoryRequest.source();
validateDirectory(uploadDirectoryRequest);
Collection<FailedFileUpload> failedFileUploads = new ConcurrentLinkedQueue<>();
List<CompletableFuture<CompletedFileUpload>> futures;
try (Stream<Path> entries = listFiles(directory, uploadDirectoryRequest)) {
futures = entries.map(path -> {
CompletableFuture<CompletedFileUpload> future = uploadSingleFile(uploadDirectoryRequest,
failedFileUploads, path);
// Forward cancellation of the return future to all individual futures.
CompletableFutureUtils.forwardExceptionTo(returnFuture, future);
return future;
}).collect(Collectors.toList());
}
// Here we should have a in-memory list of 2mio CompleteableFutures, but it will never come to this point, as a OOM will happen above, because the memory allocation for the collection will fail in case of 2 mio files.
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
.whenComplete((r, t) -> returnFuture.complete(CompletedDirectoryUpload.builder()
.failedTransfers(failedFileUploads)
.build()));
}
Additional Information/Context
Actually, this is not a memory leak, its just a bad design. In v1 you collection the directory tree in a in-memory ArrayList. This has been fixed, as you use a directory streams in v2. Nevertheless, now a huge in-memory list of CompleteableFutures is created.
AWS Java SDK version used
2.25.10
JDK version used
openjdk version "17.0.10" 2024-01-16 OpenJDK Runtime Environment (build 17.0.10+7-Debian-1deb12u1) OpenJDK 64-Bit Server VM (build 17.0.10+7-Debian-1deb12u1, mixed mode, sharing)
Operating System and version
Debian12 (bookworm)