Skip to content

@aws-sdk/lib-storage potentially does not handle backpressure leading to OOM exceptions #4257

@uncvrd

Description

@uncvrd

Checkboxes for prior research

Describe the bug

Hi! I am building a bulk upload queue microservice that utilizes the @aws-sdk/lib-storage library. When testing my service (uploading a queued files back to back until the queue is empty), I discovered that my memory usage increases over time and will eventually lead to OOM exceptions if my upload queue is large enough. I believe this has to do with the package converting the stream to buffer chunks and possibly resulting in backpressure or NodeJS not releasing these buffers via garbage collection fast enough?

For example, here's a memory log that shows two scenarios:

  1. I added enough files to upload that makes my memory usage grow over time and crash my service
  2. I only added enough files to show memory was increasing BUT not crash my service, as you can see, the memory usage decreases slowly after finishing the uploads. (It continues back to its initial state)

Bear with me here as I'm taking a deep dive in memory / streams while debugging this problem...but since the memory from 2. does not stay at its elevated level and decreases back to its normal(ish) state I believe that we can deduce that this is a backpressure issue where the chunks loaded in to buffers can not be uploaded as fast as they are generated and thus cannot be deallocated fast enough?

Screenshot 2022-12-05 at 10 06 33 PM

Honestly I'm a bit surprised I couldn't find many issues stemming from this problem so I feel like maybe I'm doing something wrong?

SDK version number

@aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v17.9.1

Reproduction Steps

I simplified my microservice code to simply read in a file from S3 and then immediately use the aws-sdk/lib-storage class Upload to upload the file back in to storage. Example code below:

//handler.ts
const { Body } = await s3API.getObject("xxx.mp3");

try {
    const { size: uploadSize } = await s3API.uploadTo(job.data.destinationAudioEnding, Body as any, "audio/mpeg");
    size = uploadSize;
    resolve(true)
} catch (error) {
    if (error instanceof Error) {
        console.log(error.message);
    }
    resolve(true);
}

Where uploadTo is a wrapper around Upload

// s3.ts

public async uploadTo(key: string, stream: Readable, contentType: string) {

    let size = 0;

    const upload = new Upload({
        client: this.s3Client,
        params: {
            Bucket: `${process.env.R2_BUCKET}`,
            Key: key,
            // we pipe to a passthrough to handle the case that the stream isn't initialized yet
            Body: stream.pipe(new PassThrough()),
            ContentType: contentType,
        },
    });

    upload.on("httpUploadProgress", (progress) => {
        size = progress.loaded ?? 0;
    });

    await upload.done();

    return {
        size
    }
}

However I'm a bit stumped because I loaded up the sdk locally and logged output from the getChunkStream method and also logged when the done method was called and confirmed that all chunks had finished iterating before done had been called. Could this mean that even though the chunks have all been uploaded, that there is a delay in deallocating them from memory after uploading? Is there a way we can force deallocation once uploaded possibly?

Observed Behavior

I observe memory increasing steadily over time until uploads stop and then this memory allocation decreases slowly back to its "initial" size. My understanding leads me to believe that this could be a result of back pressure or buffers not being released via GC.

Expected Behavior

I expect to not receive OOM errors when bulk uploading files.

Possible Solution

No response

Additional Information/Context

Please let me know if there's any more info I can provide. This has been a tough one for me to track down!

Related issues / PRs:

#2612 <-- this PR has discovered improvements to the library memory footprint but was never implemented

then a follow up in the v2 repo mentioning that memory is still an issue in v3 (for PutObjectCommands) aws/aws-sdk-js#3128 (comment)

This stackoverflow answer has a comment mentioning memory issues within V3 https://stackoverflow.com/a/69988031/6480913


EDIT:

Actually you know what, I upgraded my staging service resources from 512MB to 2GB RAM and after bulk uploading more files...it almost seems like the library needs around 500MB of memory before it stabilizes. So while it looked like there was a memory leak in my 512MB config, it was because the library couldnt reach its stable state and crashed since there wasnt enough memory allocated for it anyways.

image(1)

However before closing I would just like to push more attention to this PR to help reduce the memory footprint a bit because 500MB was a bit shocking for me #2612

Metadata

Metadata

Labels

bugThis issue is a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions