-
Notifications
You must be signed in to change notification settings - Fork 392
Description
Apache Iceberg version
0.5.0 (latest release)
Please describe the bug 🐞
When updating the schema of an iceberg table (such as adding a column), the schema_id should be incremented.
schema_id is incremented during schema evolution in the Java library but not in the Python library
From the Iceberg spec
Evolution applies changes to the table’s current schema to produce a new schema that is identified by a unique schema ID, is added to the table’s list of schemas, and is set as the table’s current schema.
From the Java unit test TestTableMetadata.java
In particular, the newly created table schema has an id of 0 or TableMetadata.INITIAL_SCHEMA_ID (L1503)
The evolved schema after calling updateSchema updated the table schema id to 1 (L1520)
In comparison, from the Python unit test test_base.py
The original table schema id is 0, but even after calling update_schema()...commit(), the schema id remains 0 (L602 & L616)
Stacktrace:
In Java, the schema_id is incremented during schema evolution. (example1, example2)
In Python, this is done using the assign_fresh_schema_ids function (example1, example2)
However, this function does not increment the schema id. (source)
Note, the _get_and_increment function is used to increment the field id.