-
Notifications
You must be signed in to change notification settings - Fork 344
Description
This is an EPIC issue that serves as a direction worth our community's attention. We can use this issue to track the features we want to offer and how close we are to achieving them.
The issue concerns compaction, specifically native compaction, to be precise, Rust-based compaction.
We all know that compaction is a resource-intensive task that involves heavy calculations, significant I/O, substantial memory consumption, and large-scale resources. I beleive compaction is the killer feature that iceberg-rust can provide for the whole communnity. I expect iceberg-rust can implement compaction more efficiently in terms of both performance and cost.
In this EPIC, I want iceberg-rust to deliver:
Compaction API for a table.
- It should have a simple API that is easier to use for small tables, such as
table.compact(). - It should have a well-designed planner and scheduler that functions efficiently in a distributed system, processing large tables quickly.
Bindings for Python and Java.
- This API should be available in Python so that PyIceberg can benefit from our implementation.
- This API should be available in Java, allowing users to enhance their Spark jobs.
Tests (E2E tests, behavior tests, fuzz tests, ...)
Compaction is more complex than just reading. Mistakes we make could break the user's entire table.
We will need various tests, including end-to-end tests, behavior tests, and fuzz tests, to ensure we have done it correctly.