Skip to content

Conversation

@Li0k
Copy link
Contributor

@Li0k Li0k commented Dec 31, 2024

This PR support update_table interface for sql catalog

  • support update_table
  • add some UT

Other PRs for reference:

After these PRs have been merged, we can use sql database as the catalog backend

@Li0k
Copy link
Contributor Author

Li0k commented Dec 31, 2024

cc @Xuanwo @liurenjie1024 @ZENOTME

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Li0k for this pr, but I have concerning introducing update table at this moment as there are many missing features such as conflict detection, commit retry.


/// Returns snapshot references.
#[inline]
pub fn snapshot_refs(&self) -> &HashMap<String, SnapshotReference> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this method? We already have lookup method for snapshot

/// TableCommit represents the commit of a table in the catalog.
#[derive(Debug, TypedBuilder)]
#[builder(build_method(vis = "pub(crate)"))]
#[builder(build_method(vis = "pub"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we make TableCommit crate only is that we don't want to allow user to build it manually, all table commits construction should go through transaction api.

update_table_metadata_builder = table_update.apply(update_table_metadata_builder)?;
}

for table_requirement in requirements {
Copy link
Contributor

@DerGut DerGut May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the requirements be checked in a transaction (that executes the update statement)? Otherwise a conflicting concurrent commit can update first and we end up in a broken table state.

The table metadata that's used to validate the requirements would also need to be loaded within the transaction.

Copy link
Contributor

@DerGut DerGut May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would also make sense to explicitly set a transaction isolation level of repeatable read. Postgres for example, defaults to read committed which can similarly get us into a broken table state:

read committed allows us to see different versions of the same row between the SELECT statement (that we use to validate the commit requirements) and the UPDATE statement. Effectively, a concurrently running conflicting update operation that commits between SELECT and UPDATE will still allow our UPDATE to succeed. We were not able to re-check the new table requirements but only checked the old ones -> we end up in a broken state.

With repeatable read on the other hand, the UPDATE should safely fail with a serialization error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants