Skip to content

Conversation

@adnanhemani
Copy link
Contributor

@adnanhemani adnanhemani commented Jun 10, 2025

As per my proposal on the Dev ML, this is the implementation of persisting Polaris Events to a persistence. The overall goal for this PR is to store Events in the Polaris persistence layer - which will later be used to power the Iceberg Events API.

High-level overview of the changes:

  • Created a new persistence table and model for Events. Table schema was publicized on the Dev ML here.
  • Implemented a new persistence method to flush events to said table
  • Implemented an in-memory and file-based buffer implementation for flushing events to the persistence
  • Created configurations for enabling all of these features
  • Modified PolarisEventListener and PolarisEvents to optimally achieve this goal
  • Created new sample events for CreateTable (explicitly part of the Iceberg Events API spec) and CreateCatalog (which is not explicitly part of the Iceberg Events API spec and will be served as a CustomOperation).

Here are the following upcoming items that will be updated in this PR soon:

  • Testing for the new code paths added

Upcoming items that will be tackled in subsequent PRs (and were excluded to ensure that this PR contains only the end-to-end MVP):

  • Support for multiple datasources to store the events
  • Allowing multiple event listeners to be set
  • Additional event types and related instrumentation

@snazy
Copy link
Member

snazy commented Jun 10, 2025

Thanks @adnanhemani for tackling the effort.

It looks that this change introduces another serialization implementation, although we already have Jackson in place. Can you explain why that's needed?

It's a bit unclear why these "buffers" are needed. Would you mind elaborating on this?

I have some concerns around how the approach could be implemented in the NoSQL world, via #1189. There we already have a way to iterate over recent changes (aka: events). Do you have some idea around how both could live together? Would be nice to consider NoSQL in this new approach, because we already agreed that NoSQL will become part of Polaris. I think it's better to have a way that satisfies both persistence worlds.

@adnanhemani
Copy link
Contributor Author

Thanks for the high-level comments, @snazy!

It looks that this change introduces another serialization implementation, although we already have Jackson in place. Can you explain why that's needed?

My anecdotal experience is that Jackson is a much slower serialization library than Kryo and given that we want to make these commands quick so that the user does not notice a difference, I went ahead with a quicker implementation. I do understand that this brings additional overhead/maintainance - so I'm okay to shift the code to use Jackson instead if that's a hard requirement. WDYT?

It's a bit unclear why these "buffers" are needed. Would you mind elaborating on this?

This was earlier discussed on the ML thread. TL;DR on this is: if we aim to support some read activities in the near future, flushing to persistence constantly per call will become quite heavy on the persistence. To ensure we are not hammering the persistence (which in the current case is the same as the metastore) and causing an accidentally DDoS, these buffers will help.

I have some concerns around how the approach could be implemented in the NoSQL world, via #1189. There we already have a way to iterate over recent changes (aka: events). Do you have some idea around how both could live together? Would be nice to consider NoSQL in this new approach, because we already agreed that NoSQL will become part of Polaris. I think it's better to have a way that satisfies both persistence worlds.

I took a brief look at the linked PR for NoSQL - but to be fair, it is a massive PR, so please correct and/or augment my knowledge as required.

Per my understanding, the changes are stored to commit to the persistence. I'm in agreement that this is very similar to events in terms of what they are representing, but not in terms of the way they are being used. Events (which I am only modifying in this PR) are to be used as administrator-facing representation of customer-triggered actions, whereas Changes (being introduced in the NoSQL PR) are a way for committing actions to the persistence. Given that Events should not be solely for changes to the persistence, I don't really see how Changes can be used to power and/or replace Events.

The way I imagine things would be that Events can (and should) be stored in the NoSQL persistence as well - and any calls to the future Events API should understand which type of persistence layer was used for Events storage and delegate the call to that persistence type. That is why I have introduced an append-only writeEvents method in the BasePersistence interface - NoSQL should also implement that.

Base line to state: I 100% agree that the Events functionality should exist in NoSQL as well as JDBC-Relational - and I'm happy to help contribute towards this once NoSQL finalizes and merges. But I'm not sure that Changes is helpful in this journey - unless we'd like to evolve that object into something that can represent all events as well.

@snazy
Copy link
Member

snazy commented Jun 11, 2025

I don't think we're in a rush with this change, because the mandatory support/spec in Iceberg is not there yet. I think it's safer to wait for the Iceberg change before adding something to Polaris. WDYT?

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adnanhemani I must say I'm not at all satisfied with the current state of FileBufferPolarisPersistenceEventListener.

In fact, I am not at all convinced that we need file-based buffers here.

Have you thought about the event bus in Quarkus / Vertx? This mechanism allows to asynchronously process events in a dedicated thread pool, without hurting the original request latencies.

https://quarkus.io/guides/vertx-reference#eventbus

Copy link
Contributor Author

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adnanhemani I must say I'm not at all satisfied with the current state of FileBufferPolarisPersistenceEventListener.

I would be glad to hear further feedback on this if there is anything further than the comments you've left on the review so far!

Have you thought about the event bus in Quarkus / Vertx? This mechanism allows to asynchronously process events in a dedicated thread pool, without hurting the original request latencies.

I did think about Vertx and similar solutions but the largest issue we have is there is no durability guarantees we can provide when using those. If Polaris crashes at any point after which the event is placed on the event bus but not processed, the event is lost and there is no way to recover. Please correct me if I'm wrong here.

@snazy
Copy link
Member

snazy commented Jun 18, 2025

From yesterday's discussion:
We should go with the minimal viable change.
There are strong concerns about the "buffering/spilling" introduced here.

Query patterns & event payload, considering the Iceberg proposal, are still not set in stone - I have mentioned concerns about the IRC proposal as it stands. Considering especially the event payload size issues in that proposal, I think it is way too early to push this one.

@adutra
Copy link
Contributor

adutra commented Jun 18, 2025

From yesterday's discussion: We should go with the minimal viable change. There are strong concerns about the "buffering/spilling" introduced here.

Query patterns & event payload, considering the Iceberg proposal, are still not set in stone - I have mentioned concerns about the IRC proposal as it stands. Considering especially the event payload size issues in that proposal, I think it is way too early to push this one.

+1 to the above.

Also I would like to stress that the need for a commitlog-like structure on disk stems from the design choice to isolate events persistence from catalog persistence. If all writes (catalog and events) were done within the same transaction, we would be leveraging the transactional backend to do the heavy lifting of maintaining those two consistent with each other. And we would get "exactly once" semantics for free. For this reason, I think we need to take a step back and rethink the persistence model.

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost ready to go 👍


abstract String getRequestId();

abstract void addToBuffer(org.apache.polaris.core.entity.PolarisEvent event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be that they are not processed in any OSS implementation

Why? Is there a compelling reason for excluding such events upfront?

Imho OSS implementations should stay very "general-purpose" and apply the same processing logic to all events.

adutra
adutra previously approved these changes Sep 2, 2025
jbonofre
jbonofre previously approved these changes Sep 2, 2025
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, thanks !

It's a great start and we can always update later.

@jbonofre jbonofre dismissed snazy’s stale review September 2, 2025 13:44

As @snazy is on vacation, in order to unblock this PR, I dismiss this review.

if (containerRequestContext != null && containerRequestContext.hasProperty(REQUEST_ID_KEY)) {
return (String) containerRequestContext.getProperty(REQUEST_ID_KEY);
}
return UUID.randomUUID().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: having a fake request id may cause issue when users are trying to debug with it, as it doesn't exist anywhere else. I think the containerRequestContext mighty always have one. At least, it's more of the containerRequestContext's responsibility to generate one if it's missing. For event, we could just use what containerRequestContext has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

buffer.computeIfAbsent(realmId, k -> new ConcurrentLinkedQueueWithApproximateSize<>());
realmQueue.add(polarisEvent);
if (realmQueue.size() >= maxBufferSize) {
futures.add(executor.submit(() -> checkAndFlushBufferIfNecessary(realmId, true)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice improvement if we don't add a duplicated future into the set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new logic for this.

@adnanhemani adnanhemani dismissed stale reviews from jbonofre and adutra via 28131a7 September 3, 2025 01:04
@adnanhemani
Copy link
Contributor Author

Hi @adutra @jbonofre @flyrain - please take a look at this PR again, I've updated it to resolve the merge conflicts as well as minor enhancements requested by Yufei!

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 3, 2025
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on it @adnanhemani !

@flyrain flyrain merged commit c3f5001 into apache:main Sep 3, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 3, 2025
@adnanhemani adnanhemani deleted the ahemani/add_polaris_events_to_persistence branch September 3, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants