Skip to content

Conversation

@barronw
Copy link
Contributor

@barronw barronw commented Dec 12, 2024

Closes #1420.

@kevinjqliu
Copy link
Contributor

@barronw looks like theres a linter issue, could you try to run make lint?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, added a few comments.

I verified that this is according to the spec
https://iceberg.apache.org/spec/#column-projection

names: A required list of 0 or more names for a field.
field-id: An optional Iceberg field ID used when a field's name is present in names
fields: An optional list of field mappings for child field of structs, maps, and lists.

"names": []
}
"""
assert MappedField(field_id=None, names=[]) == MappedField.model_validate_json(mapped_field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also test omitting the field_id=None

def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType:
if field_partner is None:
raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}")
if field_partner is None or field_partner.field_id is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this change, why do we need to check for field_partner.field_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a type check since NestedField expects a field ID below. The field partner is looked up by field ID before being passed into this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors

update.name: update.field_id for f in field_results if (update := self._updates.get(f.field_id))
update.name: update.field_id
for f in field_results
if f.field_id is not None and (update := self._updates.get(f.field_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale behind this change? Should we look at all the other places where .field_id is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_mapping API doesn't currently support changes to mappings without a field ID since updates is typed with Dict[int, NestedField].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, is this change also due to the type checker?

Copy link
Contributor Author

@barronw barronw Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also changing the API of updating_mapping probably requires a larger discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation! cc @Fokko / @sungwy for another set of eyes

def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType:
if field_partner is None:
raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}")
if field_partner is None or field_partner.field_id is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Id like another set of eyes to look over the changes w.r.t NameMapping

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one nit, but apart from that it looks good, thanks @barronw

Comment on lines 52 to 58
field_id = {"field-id": self.field_id} if self.field_id is not None else {}
fields = {"fields": self.fields} if len(self.fields) > 0 else {}
return {
"field-id": self.field_id,
**field_id,
"names": self.names,
**fields,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a style thing, I think it is a bit awkward to merge all the dicts, how about:

serialized = {
    "names": self.names
}
if self.field_id is not None:
    serialized['field-id'] = self.field_id
if len(self.fields) > 0:
    serialized['fields'] = fields
return serialized

Copy link
Contributor Author

@barronw barronw Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, needed to also reorder fields in the tests.

@Fokko Fokko merged commit 6c1e7cf into apache:main Dec 19, 2024
7 checks passed
@Fokko
Copy link
Contributor

Fokko commented Dec 19, 2024

Thanks for fixing this @barronw and thanks for the review @kevinjqliu

@barronw barronw deleted the fix-name-mapping branch December 19, 2024 22:31
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make field-id of name-mapping optional

3 participants