Simplify partition structures #763

Fokko · 2024-12-06T14:17:52Z

This PR removes SchemalessPartitionSpec and UnboundPartitionSpecField. We could also combine BoundPartitionSpec and UnboundPartitionSpec if we like, but this is already quite a big change.

From the spec:

The field-id property was added for each partition field in v2.
In v1, the reference implementation assigned field ids sequentially
in each spec starting at 1,000. See Partition Evolution for more details.

In v1, partition field IDs were not tracked, but were assigned sequentially
starting at 1000 in the reference implementation. This assignment caused
problems when reading metadata tables based on manifest files from multiple
specs because partition fields with the same ID may contain different data types.

For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:

Do not reorder partition fields

Do not drop partition fields; instead replace the field's transform with the void transform

Only add partition fields at the end of the previous partition spec

I think for simplicity, we should assign the field-IDs starting from 1000, and this will greatly simplify the objects that we need. For V1 the field-ID is missing, and we can just start assigning from 1000 onwards because the IDs are sequential, for V2 tables we deserialize the field-ID from the payload. While I also noticed that we write the field-id field for V1 tables in the reference implementation: apache/iceberg#11708

Next to that, I also believe that users shouldn't have to worry about the field-IDs and that it should be kept internal to Iceberg-Rust. For the evolution of the partition spec, we should have something similar as Java and PyIceberg, in particular for V1 tables, we have to take the rules above into account, otherwise, there is a serious issue of data-loss, or bricking a table. If we agree on this, I'm happy to implement that API.

This PR removes `SchemalessPartitionSpec` and `UnboundPartitionSpecField`. From the spec: > The field-id property was added for each partition field in v2. > In v1, the reference implementation assigned field ids sequentially > in each spec starting at 1,000. See Partition Evolution for more details. > In v1, partition field IDs were not tracked, but were assigned sequentially > starting at 1000 in the reference implementation. This assignment caused > problems when reading metadata tables based on manifest files from multiple > specs because partition fields with the same ID may contain different data types. > For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables: > - Do not reorder partition fields > - Do not drop partition fields; instead replace the field's transform with the void transform > - Only add partition fields at the end of the previous partition spec I think for simplicity, we should assign the field-IDs starting from 1000, and this will greatly simplify the objects that we need. Next to that, I also believe that users shouldn't have to worry about the field-IDs and that it should be kept internal to Iceberg-Rust.

Fokko · 2024-12-10T10:52:35Z

Closed in favor of #771

Fokko force-pushed the fd-simplify-partition-structures branch 7 times, most recently from e988ec3 to e7ca59c Compare December 6, 2024 15:14

Fokko force-pushed the fd-simplify-partition-structures branch from e7ca59c to 3eee6e3 Compare December 6, 2024 15:19

Fokko closed this Dec 10, 2024

Fokko deleted the fd-simplify-partition-structures branch December 10, 2024 11:58

Fokko mentioned this pull request Dec 12, 2024

Dectect schema evolution or partition evolution for append DataFile #777

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify partition structures #763

Simplify partition structures #763

Uh oh!

Fokko commented Dec 6, 2024 •

edited

Loading

Uh oh!

Fokko commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Simplify partition structures #763

Simplify partition structures #763

Uh oh!

Conversation

Fokko commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fokko commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fokko commented Dec 6, 2024 •

edited

Loading