-
Notifications
You must be signed in to change notification settings - Fork 392
fix: field id in name mapping should be optional #1426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@barronw looks like theres a linter issue, could you try to run |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, added a few comments.
I verified that this is according to the spec
https://iceberg.apache.org/spec/#column-projection
names: A required list of 0 or more names for a field.
field-id: An optional Iceberg field ID used when a field's name is present in names
fields: An optional list of field mappings for child field of structs, maps, and lists.
| "names": [] | ||
| } | ||
| """ | ||
| assert MappedField(field_id=None, names=[]) == MappedField.model_validate_json(mapped_field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also test omitting the field_id=None
| def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType: | ||
| if field_partner is None: | ||
| raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}") | ||
| if field_partner is None or field_partner.field_id is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about this change, why do we need to check for field_partner.field_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be a type check since NestedField expects a field ID below. The field partner is looked up by field ID before being passed into this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors
45f18aa to
afc36f8
Compare
| update.name: update.field_id for f in field_results if (update := self._updates.get(f.field_id)) | ||
| update.name: update.field_id | ||
| for f in field_results | ||
| if f.field_id is not None and (update := self._updates.get(f.field_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the rationale behind this change? Should we look at all the other places where .field_id is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update_mapping API doesn't currently support changes to mappings without a field ID since updates is typed with Dict[int, NestedField].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, is this change also due to the type checker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, also changing the API of updating_mapping probably requires a larger discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType: | ||
| if field_partner is None: | ||
| raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}") | ||
| if field_partner is None or field_partner.field_id is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Id like another set of eyes to look over the changes w.r.t NameMapping
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one nit, but apart from that it looks good, thanks @barronw
pyiceberg/table/name_mapping.py
Outdated
| field_id = {"field-id": self.field_id} if self.field_id is not None else {} | ||
| fields = {"fields": self.fields} if len(self.fields) > 0 else {} | ||
| return { | ||
| "field-id": self.field_id, | ||
| **field_id, | ||
| "names": self.names, | ||
| **fields, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More a style thing, I think it is a bit awkward to merge all the dicts, how about:
serialized = {
"names": self.names
}
if self.field_id is not None:
serialized['field-id'] = self.field_id
if len(self.fields) > 0:
serialized['fields'] = fields
return serializedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, needed to also reorder fields in the tests.
afc36f8 to
1fd60fd
Compare
|
Thanks for fixing this @barronw and thanks for the review @kevinjqliu |
Closes #1420.