Skip to content

EPIC: Rust Based Compaction #624

@Xuanwo

Description

@Xuanwo

This is an EPIC issue that serves as a direction worth our community's attention. We can use this issue to track the features we want to offer and how close we are to achieving them.


The issue concerns compaction, specifically native compaction, to be precise, Rust-based compaction.

We all know that compaction is a resource-intensive task that involves heavy calculations, significant I/O, substantial memory consumption, and large-scale resources. I beleive compaction is the killer feature that iceberg-rust can provide for the whole communnity. I expect iceberg-rust can implement compaction more efficiently in terms of both performance and cost.

In this EPIC, I want iceberg-rust to deliver:

Compaction API for a table.

  • It should have a simple API that is easier to use for small tables, such as table.compact().
  • It should have a well-designed planner and scheduler that functions efficiently in a distributed system, processing large tables quickly.

Bindings for Python and Java.

  • This API should be available in Python so that PyIceberg can benefit from our implementation.
  • This API should be available in Java, allowing users to enhance their Spark jobs.

Tests (E2E tests, behavior tests, fuzz tests, ...)

Compaction is more complex than just reading. Mistakes we make could break the user's entire table.

We will need various tests, including end-to-end tests, behavior tests, and fuzz tests, to ensure we have done it correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions