Skip to content

Conversation

@vchag
Copy link

@vchag vchag commented Oct 17, 2025

These changes were created to address concerns raised in Issues #2630

What changes were proposed in this pull request?

  1. Renaming PropertyMapEventListener for clarity:
    The class PropertyMapEventListener was misleadingly named, suggesting a generic event listening capability when its only overridden method was onAfterRefreshTable. To ensure the name accurately reflects its single responsibility, we have renamed it to AfterRefreshTableEventListener. This precise naming clearly communicates its purpose: to only process the AfterRefreshTableEvent.

  2. Streamlining Event-to-JSON Serialization:
    The previous architecture involved a redundant, two-step conversion process: events were transformed into intermediate Maps by an abstract class, and only then was the resulting map serialized into JSON (as seen in AwsCloudWatchEventListener).
    Since Jackson JSON is our chosen serialization format, we've removed this unnecessary intermediate mapping step. We introduced Jackson Mixins to provide native JSON serialization support directly on the event objects. This refactoring simplifies the serialization pipeline from:
    Event -> Map -> JSON
    to
    Event -> JSON

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

CHANGELOG.md

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vchag the mixin approach looks interesting. There are however a few items that need fixing before we can merge.

public class ObjectMapperFactory {
private ObjectMapperFactory() {}

public static ObjectMapper create() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not create an ObjectMapper like this, but rather use CDI.

  1. If possible, let's use the main ObjectMapper provided by Quarkus. You can customize it by adding your customizations in the org.apache.polaris.service.config.PolarisIcebergObjectMapperCustomizer bean.
  2. If it's not possible to reuse the main ObjectMapper (for example, because of incompatible configuration), then you should produce a new ObjectMapper bean.

I realize that the old version of AwsCloudWatchEventListener was also doing it wrong, so let's seize the opportunity to fix this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the CDI based approach. I found option #2 to be better feasible.

Let's seize the opportunity!

.registerModule(new Jdk8Module()) // If you never serialize Optional, you can remove the
// .registerModule(new Jdk8Module()) line.
.registerModule(new JavaTimeModule())
.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need to disable these features? I would prefer to customize the mapper as little as possible.

Copy link
Author

@vchag vchag Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight on my part.... will remove them.

* Mixins for Iceberg classes we don't control, to keep JSON concise. The @JsonValue marks
* toString() as the value to serialize.
*/
public class IcebergThirdPartyMixins {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to IcebergMixins and make it final.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename.

public class IcebergThirdPartyMixins {
private IcebergThirdPartyMixins() {}

public abstract static class NamespaceMixin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the best thing to do when serializing namespaces and table identifiers. Granted, it's very convenient, but it would not correctly handle dots in namespace segments.

For example the following namespaces would produce the same string:

Namespace.of("one.two", "three.four", "ns");
Namespace.of("one", "two", "three", "four", "ns");

I think it's wiser to serialize these two classes as objects, not scalars.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

How about we aligned the serialization behavior with how the Iceberg REST API represents namespaces and table identifiers?

Namespace.of("one", "two", "three", "four", "ns");
would result in:
{"namespace": ["one", "two", "three", "four", "ns"]}

TableIdentifier.of(Namespace.of("one", "two", "three", "four", "ns"), "table")
would result in:
{"namespace": ["one", "two", "three", "four", "ns"], "name": "table"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I think that's preferable. It's slightly verbose for simple cases, but it's the only representation that is compatible with all the corner cases.

import java.io.IOException;
import org.apache.iceberg.catalog.TableIdentifier;

public class TableIdentifierToStringSerializer extends JsonSerializer<TableIdentifier> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this serializer? We already have IcebergThirdPartyMixins.TableIdentifierMixin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant. I'll go ahead and remove it.

String realmId,
Collection<String> activatedRoles,
String eventType,
@JsonUnwrapped PolarisEvent event // flatten
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

*/
public abstract class PropertyMapEventListener implements PolarisEventListener {
protected abstract void transformAndSendEvent(HashMap<String, Object> properties);
public abstract class AfterRefreshTableEventListener implements PolarisEventListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but just renaming this class doesn't make it any more usable.

In its current form, this class does not qualify as a general-purpose component worthy of being distributed with Polaris OSS.

In my opinion this class must be removed.

BTW, AwsCloudWatchEventListener is also extremely problematic, as it only handles after-table-refresh events, and nothing else. In theory, AwsCloudWatchEventListener should either equally handle all 150+ event types, or make the event types to handle configurable.

* under the License.
*/

package org.apache.polaris.service.events.jsonEventListener;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to modify this package name: jsonEventListener does not comply with java naming conventions:

https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we re-name it listener?

import org.apache.polaris.service.events.json.serde.TableIdentifierToStringSerializer;

@JsonTypeName("AfterRefreshTableEvent")
public abstract class AfterRefreshTableEventMixin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this mixin, we already have specified @JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class) in PolarisEventBaseMixin. That should be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Yes. We don't need this mixin.

properties.put(
"activated_roles", ((PolarisPrincipal) securityContext.getUserPrincipal()).getRoles());
// TODO: Add request ID when it is available
protected void transformAndSendEvent(IcebergRestCatalogEvents.AfterRefreshTableEvent event) {
Copy link
Contributor

@adutra adutra Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said above, this class must handle all event types, or determine the event types to handle via configuration.

For example, we could introduce the following configuration option:

polaris.event-listener.aws-cloudwatch.event-types=\
  org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent,\
  org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterCommitTableEvent

I would be in favor of doing this change in this PR since it this class has the same shortcomings as PropertyMapEventListener.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. Lets introduce a new config for aws-cloudwatch.event-types.

public class IcebergThirdPartyMixins {
private IcebergThirdPartyMixins() {}

public abstract static class NamespaceMixin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I think that's preferable. It's slightly verbose for simple cases, but it's the only representation that is compatible with all the corner cases.

@JsonValue
public abstract String toString(); // serializes "namespace" as "db.sales"
@JsonIgnore
public abstract String toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to declare this method?

@JsonValue
public abstract String toString(); // serializes "table_identifier" as "db.sales.orders"
@JsonIgnore
public abstract String toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


/**
* Base class for event listeners that with to generically forward all {@link PolarisEvent
* PolarisEvents} to an external sinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* PolarisEvents} to an external sinks
* PolarisEvents} to an external sink.

import org.slf4j.LoggerFactory;

/** This mapper is isolated and used exclusively for CloudWatch event serializations */
public class PolarisAWSCloudWatchObjectMapperProducer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate mapper? This one seems fairly identical to the default one, except of course for the mixins. But you could add the mixins to the default mapper instead.

@WithName("event-types")
@WithDefault("org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent")
@Override
Set<String> eventTypes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be Set<Class<? extends PolarisEvent>>.

* @return a set of event types
*/
@WithName("event-types")
@WithDefault("org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a good default. I'd say the default should be an empty set, and we should treat an empty set as "process all events".

WDYT?

Also, please add the corresponding property to application.properties, that helps users:

# AWS CloudWatch event listener settings
# polaris.event-listener.type=aws-cloudwatch
# polaris.event-listener.aws-cloudwatch.log-group=polaris-cloudwatch-default-group
# polaris.event-listener.aws-cloudwatch.log-stream=polaris-cloudwatch-default-stream
# polaris.event-listener.aws-cloudwatch.region=us-east-1
# polaris.event-listener.aws-cloudwatch.synchronous-mode=false


@Inject
public PolarisAWSCloudWatchObjectMapperProducer(
@ConfigProperty(name = "polaris.aws.cloudwatch.max-body-size", defaultValue = "16M")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this config property coming from? It doesn't start with the prefix used for other CloudWatch properties, which is polaris.event-listener.aws-cloudwatch..

| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. |
| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. |
| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. |
| `polaris.event-listener.type` | `no-op` | Define the Polaris event listener type. Supported values are `no-op`, `aws-cloudwatch`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW persistence-in-memory-buffer is also supported.

| `polaris.event-listener.aws-cloudwatch.log-stream` | `polaris-cloudwatch-default-stream`| Define the AWS CloudWatch log stream name for the event listener. Ensure that Polaris' IAM credentials have the following actions: "PutLogEvents", "DescribeLogStreams", and "DescribeLogGroups" on the specified log stream/group. If the specified log stream/group does not exist, then "CreateLogStream" and "CreateLogGroup" will also be required. |
| `polaris.event-listener.aws-cloudwatch.region` | `us-east-1` | Define the AWS region for the CloudWatch event listener. |
| `polaris.event-listener.aws-cloudwatch.synchronous-mode` | `false` | Define whether log events are sent to CloudWatch synchronously. When set to true, events are sent synchronously which may impact performance but ensures immediate delivery. When false (default), events are sent asynchronously for better performance. |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why remove this line?

@vchag vchag closed this Oct 28, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Oct 28, 2025
@vchag
Copy link
Author

vchag commented Oct 28, 2025

Closing in favor of a new PR from a feature branch.
My fork’s main fell behind upstream and a fork sync/force-push unintentionally rewrote the PR head, which auto-closed this PR.
I'll recreate the changes on a proper feature branch, rebased on the latest upstream/main, and addressed prior feedback.

Thanks for the reviews so far.....carrying over all context to the new PR.

@adutra
Copy link
Contributor

adutra commented Oct 30, 2025

@vchag what's the status here? Are you still able to work in this issue?

@vchag
Copy link
Author

vchag commented Nov 3, 2025

@adutra Yes, I'm on it. Will release it within the next 4 - 8 hrs.

@vchag
Copy link
Author

vchag commented Nov 3, 2025

Here's the new PR #2962.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants