Concern about possible consistency issue in HiveCatalog's _commit_table

### Question

Currently, the HiveCatalog's `_commit_table` workflow looks like:

1. load current table metadata via `load_table`
2. construct updated metadata
3. lock the hive table
4. alter the hive table
5. unlock the hive table

Suppose now there are 2 process, A and B try to commit some changes to the same iceberg table It is possible that the code execution happens to be in the following order:

1. process A load current table metadata
2. process A construct updated metadata
3. process B starts and finishes the **whole** `_commit_table`
4. process A lock the hive table
5. process A alter the hive table
6. process A unlock the hive table

In this specific scenario, both processes successfully commit their changes because process B releases the lock before A tries to acquire. But if the `alter_table` does not support [transactional check](https://issues.apache.org/jira/browse/HIVE-26882), the changes made by process B will be overridden. 

Since in python we do not know which Hive version we are connecting to, I wonder if we need to update the code to lock the table before loading current table metadata, like what [Java implementation](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L184) does.

BTW, it seems there are some consistency issue of https://issues.apache.org/jira/browse/HIVE-26882 as well and there is an open fix for that https://github.com/apache/hive/pull/5129

Please correct me if I misunderstand something here. Thanks! 

cc: @Fokko 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concern about possible consistency issue in HiveCatalog's _commit_table #588

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concern about possible consistency issue in HiveCatalog's _commit_table #588

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions