Skip to content

Conversation

@nagraham
Copy link
Collaborator

Which issue does this PR close?

This attempts to addresses apache#1406

What changes are included in this PR?

Problem

Writing large files to Cloudflare R2 via iceberg-rust fails due to the following error:

S3Error { code: "InvalidPart", message: "All non-trailing parts must have the same length.", resource: "", request_id: "" }

Info

Multipart uploads to Cloudflare R2 have a strict requirement that all parts (except the final part) must have the same size (link to docs).

Iceberg rust uses OpenDAL for writing to object storage. OpenDAL appears to have logic to adaptively set chunk sizes during multi-part uploads, but that doesn't work with r2. That project used to have a configuration setting to handle consistent chunk sizes, but they removed that config, and instead added the chunk() feature. See this OpenDAL issue for context, where the maintainer suggested setting that value in iceberg-rust.

Solution

This commit adds a generic optional configuration property called io.write.chunk-size which sets the chunk size on the writer. If the value is not present, then writes work as they do now; otherwise, it applies the consistent chunk size.

Here's an example of setting up a RestCatalog with this property to write 32MB chunks.

    props.insert(
        "io.write.chunk-size".to_string(),
        (32 * 1024 * 1024).to_string(),
    );

    let cat = RestCatalog::new(
        RestCatalogConfig::builder()
            .uri(catalog_uri)
            .warehouse(warehouse)
            .props(props)
            .build(),
    );

Are these changes tested?

  • A unit test validates that setting the io.write.chunk-size property will set the chunk size.
  • I manually tested the change by writing large files into R2 Data Catalog (which otherwise would have failed with the "InvalidPart" error).

@chenzl25 chenzl25 requested review from Li0k, chenzl25 and xxchan June 26, 2025 12:27
Copy link
Collaborator

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nagraham nagraham merged commit 81ac281 into dev_rebase_main_20250325 Jun 27, 2025
19 of 21 checks passed
@nagraham nagraham deleted the nagraham/add-chunk-size-to-writer branch June 27, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants