Skip to content

schema_id not incremented during schema evolution  #290

@kevinjqliu

Description

@kevinjqliu

Apache Iceberg version

0.5.0 (latest release)

Please describe the bug 🐞

When updating the schema of an iceberg table (such as adding a column), the schema_id should be incremented.
schema_id is incremented during schema evolution in the Java library but not in the Python library

From the Iceberg spec

Evolution applies changes to the table’s current schema to produce a new schema that is identified by a unique schema ID, is added to the table’s list of schemas, and is set as the table’s current schema.

From the Java unit test TestTableMetadata.java
In particular, the newly created table schema has an id of 0 or TableMetadata.INITIAL_SCHEMA_ID (L1503)
The evolved schema after calling updateSchema updated the table schema id to 1 (L1520)

In comparison, from the Python unit test test_base.py
The original table schema id is 0, but even after calling update_schema()...commit(), the schema id remains 0 (L602 & L616)

Stacktrace:
In Java, the schema_id is incremented during schema evolution. (example1, example2)

In Python, this is done using the assign_fresh_schema_ids function (example1, example2)
However, this function does not increment the schema id. (source)
Note, the _get_and_increment function is used to increment the field id.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions