Simplify partition structures #763
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes
SchemalessPartitionSpecandUnboundPartitionSpecField. We could also combineBoundPartitionSpecandUnboundPartitionSpecif we like, but this is already quite a big change.From the spec:
I think for simplicity, we should assign the field-IDs starting from 1000, and this will greatly simplify the objects that we need. For V1 the
field-IDis missing, and we can just start assigning from 1000 onwards because the IDs are sequential, for V2 tables we deserialize thefield-IDfrom the payload. While I also noticed that we write thefield-idfield for V1 tables in the reference implementation: apache/iceberg#11708Next to that, I also believe that users shouldn't have to worry about the field-IDs and that it should be kept internal to Iceberg-Rust. For the evolution of the partition spec, we should have something similar as Java and PyIceberg, in particular for V1 tables, we have to take the rules above into account, otherwise, there is a serious issue of data-loss, or bricking a table. If we agree on this, I'm happy to implement that API.