Skip to content

[feature] UpdateSchema.add_column supports both parent and child in the same transaction #1493

@kevinjqliu

Description

@kevinjqliu

Apache Iceberg version

None

Please describe the bug 🐞

Current we cannot add the parent field with its child nested field in the same transaction.
For example,

with table.update_schema() as update:
    # In a struct
    update.add_column("details", StructType())
    update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")

We should update the API docs as well

To reproduce:

from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import DoubleType, IntegerType, NestedField, StringType, StructType

warehouse_path = "/tmp/warehouse"
catalog = SqlCatalog(
    "default",
    uri=f"sqlite:///{warehouse_path}/pyiceberg_catalog.db", warehouse=f"file://{warehouse_path}",
)

schema = Schema(
    NestedField(1, "city", StringType(), required=False),
    NestedField(2, "lat", DoubleType(), required=False),
    NestedField(3, "long", DoubleType(), required=False),
)
catalog.create_namespace_if_not_exists("default")
try:
    catalog.drop_table("default.locations")
except:
    pass

table = catalog.create_table("default.locations", schema)

with table.update_schema() as update:
    update.add_column("retries", IntegerType(), "Number of retries to place the bid")
    # In a struct
    update.add_column("details", StructType())
    update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")

Error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/kevinliu/repos/iceberg-python/pyiceberg/table/update/schema.py", line 192, in add_column
    parent_field = self._schema.find_field(parent_full_path, self._case_sensitive)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/repos/iceberg-python/pyiceberg/schema.py", line 215, in find_field
    raise ValueError(f"Could not find field with name {name_or_id}, case_sensitive={case_sensitive}")
ValueError: Could not find field with name details, case_sensitive=True

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions