Add name-mapping #212

Fokko · 2023-12-13T21:36:26Z

All the things to (de)serialize the name-mapping, and all the neccessary visitors and such

pyiceberg/table/name_mapping.py

rdblue · 2023-12-17T19:31:38Z

pyiceberg/table/name_mapping.py

+        try:
+            return self._field_by_id[field_id]
+        except KeyError as e:
+            raise ValueError(f"Could not find field-id: {field_id}") from e


Why raise an exception? In the Java ApplyNameMapping, fields are not updated with an ID if there is no matching mapping. It seems like returning Optional[MappedField] is the equivalent here.

This is considered more idiomatic (although it is still up for debate): https://devblogs.microsoft.com/python/idiomatic-python-eafp-versus-lbyl/

We have the same pattern in find_field:

iceberg-python/pyiceberg/schema.py

Lines 177 to 203 in ba8f9eb

def find_field(self, name_or_id: Union[str, int], case_sensitive: bool = True) -> NestedField:

"""Find a field using a field name or field ID.

Args:

name_or_id (Union[str, int]): Either a field name or a field ID.

case_sensitive (bool, optional): Whether to perform a case-sensitive lookup using a field name. Defaults to True.

Raises:

ValueError: When the value cannot be found.

Returns:

NestedField: The matched NestedField.

"""

if isinstance(name_or_id, int):

if name_or_id not in self._lazy_id_to_field:

raise ValueError(f"Could not find field with id: {name_or_id}")

return self._lazy_id_to_field[name_or_id]

if case_sensitive:

field_id = self._name_to_id.get(name_or_id)

else:

field_id = self._lazy_name_to_id_lower.get(name_or_id.lower())

if field_id is None:

raise ValueError(f"Could not find field with name {name_or_id}, case_sensitive={case_sensitive}")

return self._lazy_id_to_field[field_id]

This avoids having to do endless None-check in the code downstream. Often when you are sure that it can't be None, but you have to please the type checker.

Don't you just need to catch ValueError instead? It's normal for fields to not be mapped.

rdblue · 2023-12-17T19:46:04Z

pyiceberg/table/name_mapping.py

+    def _field_by_name(self) -> Dict[str, MappedField]:
+        return visit_name_mapping(self, _IndexByName())
+
+    def id(self, name: str) -> int:


Looks like this pulls from both NameMapping and MappedFields. The behavior is like NameMapping because it looks up fields but the method names are from MappedFields. I think this should use the find method names from NameMapping because the project typically uses find to signal that it isn't a direct lookup.

Also, there isn't a use of the lookup by ID method other than in tests, so I'd skip that. I think the only one that is used is the lookup(name: Tuple[str, ...]) version.

Good catch, missed that one since they are combined in Python!

rdblue · 2023-12-17T19:47:34Z

pyiceberg/table/name_mapping.py

+
+    def id(self, name: str) -> int:
+        try:
+            return self._field_by_name[name].field_id


I think this should behave like the methods elsewhere that accept Union[str, Tuple[str, ...]]. If the caller has a path of separate names to find, then this should join them with . before looking up the mapping.

What do you think of:

def find(self, *names: str) -> MappedField: name = '.'.join(names) try: return self._field_by_name[name] except KeyError as e: raise ValueError(f"Could not find field with name: {name}") from e

I think this is more pythonic, it will work as find('a.b.c'), find('a', 'b', 'c') and find(*['a', 'b', 'c']).

Looks great!

rdblue · 2023-12-17T19:53:14Z

pyiceberg/table/name_mapping.py

+        """Visit a MappedField."""
+
+
+class _IndexById(NameMappingVisitor[Dict[int, MappedField]]):


I'm not sure this is actually needed.

Alright, I just copied what's on the Java side, but if you don't think it isn't needed, let's remove it for now.

rdblue · 2023-12-17T19:57:29Z

pyiceberg/table/name_mapping.py

+    return visitor.fields(fields, results)
+
+
+def load_mapping_from_json(mapping: str) -> NameMapping:


What about parse?

rdblue · 2023-12-17T19:58:34Z

pyiceberg/table/name_mapping.py

+
+@visit_name_mapping.register(list)
+def _(fields: List[MappedField], visitor: NameMappingVisitor[T]) -> T:
+    results = [visitor.field(field, visit_name_mapping(field.fields, visitor)) for field in fields]


What happens if fields is None? Will that return None or raise an exception?

I think this is hidden by returning [] instead of None in the _CreateMapping visitor.

Yes, this is an interesting semantic discussion. When Pydantic deserializes, and it encounters that the fields field is missing, it will automatically inject an empty list:

iceberg-python/pyiceberg/table/name_mapping.py

Line 40 in 623ad6a

fields: List[MappedField] = Field(default_factory=list)

When serializing to JSON, it will omit the fields key in the JSÓN object.

This way we can just always assume that there is a list as it is annotated, and this is also why the type-checker isn't complaining. So it boils down to the definition: Is there a difference in meaning when fields is set to None or []?

I also added tests to illustrate the (de)serialization and also capture this behavior.

As long as it works, I'm good with this.

rdblue · 2023-12-17T19:59:43Z

pyiceberg/table/name_mapping.py

+        ]
+
+    def primitive(self, primitive: PrimitiveType) -> List[MappedField]:
+        return []


Shouldn't this be None?

Then all the signatures should change into Optional[List[MappedField]] which isn't nice. I've commented above: #212 (comment)

rdblue · 2023-12-17T20:01:51Z

pyiceberg/table/name_mapping.py

+    fields: List[MappedField] = Field(default_factory=list)
+
+    @model_serializer
+    def ser_model(self) -> Dict[str, Any]:


Is this needed if fields is Optional[List[MappedField]] instead? It seems more accurate to me to use None for nested mappings for primitive types.

No this is the ser, it will leave out the fields entirely if there are no nested fields.

What I meant was: is this method still needed if fields is None instead of an empty list?

tests/table/test_name_mapping.py

rdblue

Looks great overall. I have a few minor comments about API and whether to use [] or None when there are no nested fields.

rdblue · 2023-12-19T00:47:46Z

pyiceberg/table/name_mapping.py

+    def _field_by_name(self) -> Dict[str, MappedField]:
+        return visit_name_mapping(self, _IndexByName())
+
+    def find(self, *names: str) -> MappedField:


Minor: Do we need to check that names is not empty?

rdblue

This looks correct to me. My only issue is that find will throw a ValueError when a name isn't found, which I think is a bit awkward compared to handling None. That's minor though and will get worked out when we go to write ApplyNameMapping code.

And it's arguably more pythonic to raise exceptions for control flow. 😄

Fokko force-pushed the fd-name-mapping branch from 15bf5f5 to a8aed26 Compare December 13, 2023 21:36

Add name-mapping

0a0e829

All the things to (de)serialize the name-mapping, and all the neccessary visitors and such

Fokko force-pushed the fd-name-mapping branch from a8aed26 to 0a0e829 Compare December 13, 2023 21:37

rdblue reviewed Dec 14, 2023

View reviewed changes

pyiceberg/table/name_mapping.py Outdated Show resolved Hide resolved

Fokko mentioned this pull request Dec 14, 2023

Support 'schema.name-mapping.default' Column Projection property #202

Closed

Fokko added 3 commits December 14, 2023 10:13

Move the names from a set to a list

5a673d0

Move from set to lint in tests as well

2c9be7c

make tests happy

c13e3b3

HonahX reviewed Dec 15, 2023

View reviewed changes

pyiceberg/table/name_mapping.py Outdated Show resolved Hide resolved

sungwy mentioned this pull request Dec 17, 2023

Apply Name mapping #219

Merged

3 tasks

Change to lists, thanks HonahX

623ad6a

rdblue reviewed Dec 17, 2023

View reviewed changes

pyiceberg/table/name_mapping.py Show resolved Hide resolved

rdblue reviewed Dec 17, 2023

View reviewed changes

tests/table/test_name_mapping.py Show resolved Hide resolved

rdblue requested changes Dec 17, 2023

View reviewed changes

Thanks Ryan!

e5982c6

rdblue reviewed Dec 19, 2023

View reviewed changes

rdblue approved these changes Dec 19, 2023

View reviewed changes

rdblue merged commit dcc3d9f into apache:main Dec 19, 2023

sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 19, 2023

Add name-mapping (apache#212)

199fb85

Fokko deleted the fd-name-mapping branch December 19, 2023 08:32

sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Jan 13, 2024

Add name-mapping (apache#212)

23e510a

jonathanc-n mentioned this pull request Mar 21, 2025

feat: re-export name mapping apache/iceberg-rust#1116

Merged

	def find_field(self, name_or_id: Union[str, int], case_sensitive: bool = True) -> NestedField:
	"""Find a field using a field name or field ID.

	Args:
	name_or_id (Union[str, int]): Either a field name or a field ID.
	case_sensitive (bool, optional): Whether to perform a case-sensitive lookup using a field name. Defaults to True.

	Raises:
	ValueError: When the value cannot be found.

	Returns:
	NestedField: The matched NestedField.
	"""
	if isinstance(name_or_id, int):
	if name_or_id not in self._lazy_id_to_field:
	raise ValueError(f"Could not find field with id: {name_or_id}")
	return self._lazy_id_to_field[name_or_id]

	if case_sensitive:
	field_id = self._name_to_id.get(name_or_id)
	else:
	field_id = self._lazy_name_to_id_lower.get(name_or_id.lower())

	if field_id is None:
	raise ValueError(f"Could not find field with name {name_or_id}, case_sensitive={case_sensitive}")

	return self._lazy_id_to_field[field_id]

		"""Visit a MappedField."""


		class _IndexById(NameMappingVisitor[Dict[int, MappedField]]):

		return visitor.fields(fields, results)


		def load_mapping_from_json(mapping: str) -> NameMapping:

Add name-mapping #212

Add name-mapping #212

Uh oh!

Conversation

Fokko commented Dec 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fokko Dec 18, 2023 •

edited

Loading