Support partitioned writes

### Feature Request / Improvement

Support partitioned writes

So I think we want to tackle the static overwrite first, and then we can compute the predicate for the dynamic overwrite to support that. We can come up with a separate API. I haven't really thought this trough, and we can still change this.
I think the most important steps are the breakdown of the work. There is a lot involved, but luckily we already get the test suite from the full overwrite.

Steps I can see:
- [x] Extend the summary generation to support partitioned writes ([here](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java) in Java)
- [x] Add support for the append files.
  - How are we going to fan out the writing of the data. We have an Arrow table, what is an efficient way to compute the partitions and scale out the work. For example, are we going to sort the table on the partition column and do a full pass through it? Or are we going to compute all the affected partitions, and then scale out?
- [x] Add support for static overwrites
- [x] Add support for dynamic overwrites

Other things on my mind:
- In Iceberg it can be that some files are still on an older partitioning, we should make sure that we handle those correctly based on the that we provide.
- How to handle delete files; it might be that the delete files become unrelated because the affected datafiles are replaced. We could first ignore this.

The good part:
- In PyIceberg we're first going to ignore the fast-appends (this is when you create a new manifest, and add it to the manifest list). Instead we'll just take the existing manifest(s) and rewrite it into a single new manifest which makes it a bit easier to reason about the snapshot (and therefore the snapshot summaries). The reason is that this caused quite a few bugs in Java, and it can be added always on a later moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support partitioned writes #208

Feature Request / Improvement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support partitioned writes #208

Description

Feature Request / Improvement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions