Skip to content

Parallel Table.append #428

@bigluck

Description

@bigluck

Apache Iceberg version

main (development)

Please describe the bug 🐞

While doing some tests with the latest RC (v0.6.0rc5), I generated a ~6.7GB arrow table and appended it to a new table.

In terms of performances, I got similar results (writing to S3) on these 2 type of EC2 machines:

  • c5ad.8xlarge 32 core, 64 ram, 10gbps nic -> wrote 1 parquet file of 2GB in 31s
  • c5ad.16xlarge 64 core, 128 ram, 20gbps nic -> wrote 1 parquet file of 1.6GB in 28s

By using htop I notice that the code was only using a thread during the append operation, which means that it's not parallelizing the write operation.

Screenshot 2024-02-13 at 14 26 35

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions