Skip to content

Conversation

@mansehajsingh
Copy link
Contributor

@mansehajsingh mansehajsingh commented Apr 18, 2025

This PR uses the Apache Iceberg OAuth2 utilities to enable a wider array of authentication flows to the tool. Many of the options have been standardized to the same options that Iceberg OAuth2 properties use. Here are a few examples:

  1. For client_credentials flow, the only thing that changes is that the cli will use the property credential which is formatted <client_id>:<client_secret> as opposed to separate properties. These will now be refreshed periodically.
java -jar cli/build/libs/polaris-synchronizer-cli.jar \
create-omnipotent-principal \
--polaris-api-connection-properties base-url=http://localhost:8181/ \
--polaris-api-connection-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \
--polaris-api-connection-properties credential=<client_id>:<client_secret> \
--polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL
  1. For regular bearer token authentication, instead of bearer-token the tool will now use token. This initializes a session that does not refresh:
java -jar cli/build/libs/polaris-synchronizer-cli.jar \
create-omnipotent-principal \
--polaris-api-connection-properties base-url=http://localhost:8181/ \
--polaris-api-connection-properties token=<bearer_token> \
--polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL

3. Polaris supports exchanging an access token for another access token. For this flow, you can now provide a <subject_token_type>=<subject_token> property pair to use for token exchange. Natively within Polaris we only support urn:ietf:params:oauth:token-type:access_token as the subject token type, but all the token types are supported in this PR in case external OAuth is used. As well, in Polaris you need to send the token in the Authorization header as well to call the token exchange endpoint, so you need to specify the token property as well to provide a bearer token to the token exchange request. The bearer token type will default to token type urn:ietf:params:oauth:token-type:access_token.

java -jar cli/build/libs/polaris-synchronizer-cli.jar \
create-omnipotent-principal \
--polaris-api-connection-properties base-url=http://localhost:8181/ \
--polaris-api-connection-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \
--polaris-api-connection-properties token=<bearer_token>\
--polaris-api-connection-properties urn:ietf:params:oauth:token-type:access_token=<bearer_token> \
--polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL
  1. Snowflake Open Catalog keypair authentication is natively supported with the client_credentials flow. As we can see in the documentation, we just need to provide the generated JWT to the client_secret field, like so (notice the empty client id):
java -jar cli/build/libs/polaris-synchronizer-cli.jar \
create-omnipotent-principal \
--polaris-api-connection-properties base-url=http://localhost:8181/ \
--polaris-api-connection-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \
--polaris-api-connection-properties credential=:<JWT_BEARER> \
--polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL
  1. External OAuth. Here is an example of how Snowflake Open Catalog does external OAuth support: https://other-docs.snowflake.com/en/LIMITEDACCESS/opencatalog/external-oauth. This is also supported by these additions. Now we just need to specify the oauth2-server-uri of the external oauth server, and we can specify the optional OAuth parameters like audience through the CLI as well:
java -jar cli/build/libs/polaris-synchronizer-cli.jar \
create-omnipotent-principal \
--polaris-api-connection-properties base-url=https://<your_org_name>-<your_open_catalog_account_name>.snowflakecomputing.com/polaris \
--polaris-api-connection-properties oauth2-server-uri=https://<Auth0_domain>/oauth/token \
--polaris-api-connection-properties credential=<client_id>:<client_secret> \
--polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL \
--polaris-api-connection-properties audience=https://<your_org_name>-<your_open_catalog_account_name>.snowflakecomputing.com

OAuth2Properties.ACCESS_TOKEN_TYPE,
OAuth2Properties.JWT_TOKEN_TYPE,
OAuth2Properties.SAML2_TOKEN_TYPE,
OAuth2Properties.SAML1_TOKEN_TYPE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not working as intended. You need to use a LinkedHashSet if you want a Set that preserves insertion order. In this case, a regular List would also work just fine.

Here is an example that demonstrates that Set.of(...) does not preserve insertion order.

jshell> Set<Integer> s = Set.of(1, 5, 2, 4, 3)
s ==> [5, 4, 3, 2, 1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to a List! Good catch

if (properties.containsKey(OAuth2Properties.CREDENTIAL)) {
return OAuth2Util.AuthSession.fromCredential(
restClient,
ThreadPools.newScheduledPool(UUID.randomUUID() + "-token-refresh", 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It could be useful to add a comment here (and below) for code maintainability. It looks like the thread pool with never be shut down, and therefore that the application can never terminate. But the Iceberg documentation states that threads created by ThreadPools.newScheduledPool(...) will be daemon threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment clarifying this.

@adutra
Copy link

adutra commented Apr 21, 2025

Hi @mansehajsingh I have a few questions about the features being introduced here:

Polaris supports exchanging an access token for another access token.

In Polaris, the token exchange grant type is meant primarily for token refreshes. What is the use case here? Where is the subject token expected to come from?

so you need to specify the token property as well to provide an actor token to the token exchange request.

What is the use case for the actor token, and where it is supposed to come from? Asking because the actor token is not honored by OSS Polaris, only vendor-specific products make usage of actor tokens, e.g. Tabular. Which makes me realize that we might be "sneaking in" some vendor-specific features here. I don't mind doing so, but I think they should be more clearly flagged as Snowflake-specific features.

External OAuth. Here is an example of how Snowflake Open Catalog does external OAuth support: https://other-docs.snowflake.com/en/LIMITEDACCESS/opencatalog/external-oauth.

That's interesting, thanks for link 👍 I didn't know that Open Catalog already had support for external authentication, since OSS Polaris doesn't – but hopefully not for a long time: apache/polaris#1397. We probably should make sure that whichever support is added here for external auth also works with OSS Polaris with an external IDP like Keycloak or Auth0.

@mansehajsingh
Copy link
Contributor Author

Hi @adutra!

In Polaris, the token exchange grant type is meant primarily for token refreshes. What is the use case here? Where is the subject token expected to come from?

To be honest, writing this part of the implementation I was trying to keep a level of parity with the way authentication was implemented in RESTSessionCatalog#newSession in Iceberg, but I see you've merged some changes recently to revamp that behaviour. If we foresee no use case- even when external OAuth does land in OSS Polaris- to support flexible token exchange outside of token refresh with Polaris, then I have no qualms about removing this part from AuthenticationSessionWrapper and just supporting client_credentials and a preset bearer token.

What is the use case for the actor token, and where it is supposed to come from? Asking because the actor token is not honored by OSS Polaris, only vendor-specific products make usage of actor tokens, e.g. Tabular. Which makes me realize that we might be "sneaking in" some vendor-specific features here. I don't mind doing so, but I think they should be more clearly flagged as Snowflake-specific features.

You're right- I had mislabelled this, I've updated this in the PR description and the comments. I had been confused looking at the implementation of Iceberg's OAuth2Util#fromTokenExchange as it passes the parent AuthSession's token as an actorToken to OAuth2Util#exchangeToken. Taking another look at the implementation, the real reason why the token is needed is so that the parent session puts the existing token in the Authorization header since Polaris requires token exchange to be performed with the existing token in the Authorization header. Then, the headers are inherited from the parent session in OAuth2Util#exchangeToken. If we end up removing token exchange this may not even be relevant anymore. There are no vendor specific implementations I was trying to support here.

That's interesting, thanks for link 👍 I didn't know that Open Catalog already had support for external authentication, since OSS Polaris doesn't – but hopefully not for a long time: apache/polaris#1397. We probably should make sure that whichever support is added here for external auth also works with OSS Polaris with an external IDP like Keycloak or Auth0.

Agreed, are we okay to merge this PR ahead of time once it has gone through review and open up an issue to ensure that when external OAuth is finalized in Polaris we ensure that this external OAuth support is compatible?

@mansehajsingh
Copy link
Contributor Author

cc: @collado-mike

@adutra
Copy link

adutra commented Apr 22, 2025

If we foresee no use case- even when external OAuth does land in OSS Polaris- to support flexible token exchange outside of token refresh with Polaris, then I have no qualms about removing this part from AuthenticationSessionWrapper and just supporting client_credentials and a preset bearer token.

I cannot see a use case where token exchange is going to be necessary in the context of a catalog synchronization. I would suggest to refrain from including support for that initially, and wait until someone actually comes up with a valid use case.

I had been confused looking at the implementation of Iceberg's OAuth2Util#fromTokenExchange

You are not the only one 😄 Here is some context:

The only situation where a token exchange happens in Iceberg and it's not a token refresh scenario, is when the server "vends" an OAuth token to the client as part of a LoadTableResponse and the client creates a "table session" for it:

https://github.com/apache/iceberg/blob/9587a2e3d5ff658ed1427d17ea2d351029012e7e/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Manager.java#L208-L211

In that scenario, immediately after the vended token is received, a token exchange happens and the vended token becomes the subject token and the client's current OAuth2 token becomes the actor token.

I do not know of any catalog server that uses this feature. Polaris OSS and Nessie do not support it. I suspect Tabular was making use of it. And I would argue that vending OAuth tokens is not a good practice anyways.

All of this to say: I would suggest again to hold off implementing support for this in the context of this catalog synchronization tool.

are we okay to merge this PR ahead of time once it has gone through review and open up an issue to ensure that when external OAuth is finalized in Polaris we ensure that this external OAuth support is compatible?

Yes, that's fine 👍

@mansehajsingh
Copy link
Contributor Author

@adutra I've gone ahead and removed token exchange flow- thanks for the context!

Comment on lines 38 to 46
.credential(properties.get(OAuth2Properties.CREDENTIAL))
.scope(properties.get(OAuth2Properties.SCOPE))
.oauth2ServerUri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.token(properties.get(OAuth2Properties.TOKEN))
.tokenType(OAuth2Properties.ACCESS_TOKEN_TYPE)
.optionalOAuthParams(OAuth2Util.buildOptionalParam(properties))
.build()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to create the parent session without token and credential since you are going to create a child session with these right after.

Suggested change
.credential(properties.get(OAuth2Properties.CREDENTIAL))
.scope(properties.get(OAuth2Properties.SCOPE))
.oauth2ServerUri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.token(properties.get(OAuth2Properties.TOKEN))
.tokenType(OAuth2Properties.ACCESS_TOKEN_TYPE)
.optionalOAuthParams(OAuth2Util.buildOptionalParam(properties))
.build()
.scope(properties.get(OAuth2Properties.SCOPE))
.oauth2ServerUri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.optionalOAuthParams(OAuth2Util.buildOptionalParam(properties))
.build()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken them out 👍

*/
private OAuth2Util.AuthSession newAuthSession(Map<String, String> properties) {

RESTClient restClient = HTTPClient.builder(Map.of())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This client holds resources and must be closed when the application closes. Since we already have a problem with the thread pools created below, as @pingtimeout pointed out, I think that it would be worth spending some time to make this class implement AutoCloseable, then properly implement the close() method and close all the resources.

But then you'd need to make sure that the AuthenticationSessionWrapper.close() method is called. I see that this class is used in two places: PolarisCatalog and PolarisApiService. For PolarisCatalog it's easy because it already has a close() method. However PolarisApiService doesn't, so you might need to investigate how to properly close its AuthenticationSessionWrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and made some of the core classes implement Closeable. I've tried to add explicit and appropriate calls to close whenever possible, and made it so that the CLI also ensures that in the event of unexpected failure, there are runtime hooks to ensure the resources are closed appropriately. I've tried to scope the explicit closing to the sections that call the methods to create the closeable resources. Now,

  • PolarisSynchronizer will close any IcebergCatalogService it creates, in which case PolarisIcebergCatalogService will close the underlying PolarisCatalog which closes its own AuthenticationSessionWrapper.
  • PolarisApiService is closed explicitly by the CLI on program termination as well, closings its AuthenticationSessionWrapper.

@mansehajsingh mansehajsingh force-pushed the client-credentials-token-refresh branch from 1b7b5bf to a53e33b Compare April 23, 2025 00:26
@mansehajsingh mansehajsingh force-pushed the client-credentials-token-refresh branch from a53e33b to 69a8b56 Compare April 23, 2025 00:29
@travis-bowen
Copy link

The changes here seem good to me though I'd defer to those who are already commenting and have greater context to the auth flows. Thanks for these changes as it helps this tool be more powerful!

@mansehajsingh mansehajsingh requested a review from adutra April 23, 2025 05:56
Comment on lines 22 to 29
private final OAuth2Util.AuthSession authSession;

public AuthenticationSessionWrapper(Map<String, String> properties) {
this.restClient = HTTPClient.builder(Map.of())
.uri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.build();
this.authSession = this.newAuthSession(this.restClient, properties);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it let's tackle the executor issue as well:

Suggested change
private final OAuth2Util.AuthSession authSession;
public AuthenticationSessionWrapper(Map<String, String> properties) {
this.restClient = HTTPClient.builder(Map.of())
.uri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.build();
this.authSession = this.newAuthSession(this.restClient, properties);
}
private final OAuth2Util.AuthSession authSession;
private final ScheduledExecutorService executor;
public AuthenticationSessionWrapper(Map<String, String> properties) {
this.restClient = HTTPClient.builder(Map.of())
.uri(properties.get(OAuth2Properties.OAUTH2_SERVER_URI))
.build();
this.authSession = this.newAuthSession(this.restClient, properties);
executor = ThreadPools.newScheduledPool(UUID.randomUUID() + "-token-refresh", 1);
}

Comment on lines 83 to 85
if (this.restClient != null) {
this.restClient.close();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this.restClient != null) {
this.restClient.close();
}
try (restClient; executor){}

Comment on lines 164 to 172
if (httpClient != null) {
httpClient.close();
}

if (this.authenticationSession != null) {
this.authenticationSession.close();
}

super.close();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (httpClient != null) {
httpClient.close();
}
if (this.authenticationSession != null) {
this.authenticationSession.close();
}
super.close();
AuthenticationSessionWrapper session = authenticationSession;
HttpClient httpClient = this.httpClient;
try (session; httpClient) {
super.close();
} finally {
this.authenticationSession = null;
this.httpClient = null;
this.objectMapper = null;
this.resourcePaths = null;
}

Comment on lines 166 to 168
if (this.catalog != null) {
this.catalog.close();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this.catalog != null) {
this.catalog.close();
}
this.catalog.close();

"\n\t- oauth2-server-uri: the uri of the OAuth2 server to authenticate to. (eg. http://localhost:8181/api/catalog/v1/oauth/tokens)" +
"\n\t- credential: the client credentials to use to authenticate against the Polaris instance (eg. <client_id>:client_secret>)" +
"\n\t- scope: the scope to authenticate with for the service_admin (eg. PRINCIPAL_ROLE:ALL)" +
"\n\t- <token_type>=<token>: for token exchange authentication, the token type (eg. urn:ietf:params:oauth:token-type:access_token) with a provided token";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still hold true?

* the program reaches the call to {@link Closeable#close()}.
* @param closeable the resource to close
*/
public static void closeResourceOnTermination(final Closeable closeable) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, it's possible to close all resources using try-with-resources blocks.

polarisApiConnectionProperties.putIfAbsent(PolarisApiService.ICEBERG_WRITE_ACCESS_PROPERTY,
String.valueOf(withWriteAccess));

PolarisService polaris = PolarisServiceFactory.createPolarisService(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use try-with-resources instead, it's much more reliable:

    try (PolarisService polaris = PolarisServiceFactory.createPolarisService(
        PolarisServiceFactory.ServiceType.API, polarisApiConnectionProperties)) {
        ...
    }

PolarisServiceFactory.createPolarisService(PolarisServiceFactory.ServiceType.API, sourceProperties);
PolarisService target =
PolarisServiceFactory.createPolarisService(PolarisServiceFactory.ServiceType.API, targetProperties);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use try-with-resources again:

    try (
        PolarisService source =
            PolarisServiceFactory.createPolarisService(PolarisServiceFactory.ServiceType.API, sourceProperties);
        PolarisService target = PolarisServiceFactory.createPolarisService(
            PolarisServiceFactory.ServiceType.API, targetProperties)) {

}
}
}));
if (etagService instanceof Closeable closeableETagService) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: make ETagManager extend Autocloseable:

public interface ETagManager extends AutoCloseable {
   ...
  @Override
  default void close() throws Exception {
  }
}

Then here:

    try (
        PolarisService source =
            PolarisServiceFactory.createPolarisService(PolarisServiceFactory.ServiceType.API, sourceProperties);
        PolarisService target = PolarisServiceFactory.createPolarisService(
            PolarisServiceFactory.ServiceType.API, targetProperties);
        ETagManager etagService = ETagManagerFactory.createETagManager(etagManagerType,
          etagManagerProperties)) {

      PolarisSynchronizer synchronizer =
          new PolarisSynchronizer(
              consoleLog,
              haltOnFailure,
              accessControlAwarePlanner,
              source,
              target,
              etagService);
      synchronizer.syncPrincipalRoles();
      if (shouldSyncPrincipals) {
        consoleLog.warn(
            "Principal migration will reset credentials on the target Polaris instance. " +
            "Principal migration will log the new target Principal credentials to stdout.");
        synchronizer.syncPrincipals();
      }
      synchronizer.syncCatalogs();

    }

@mansehajsingh mansehajsingh requested a review from adutra April 23, 2025 18:32
Copy link

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mansehajsingh this is almost ready to go, please fix the issue with catalogs improperly closed in PolarisSynchronizer.

syncNamespaces(
catalog.getName(), Namespace.empty(), sourceIcebergCatalogService, targetIcebergCatalogService);

try {
Copy link

@adutra adutra Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious; the catalogs won't be properly closed if an error is thrown above.

Suggestion:

      try(IcebergCatalogService sourceIcebergCatalogService = source.initializeIcebergCatalogService(catalog.getName())) {
        
        clientLogger.info(
            "Initialized Iceberg REST catalog for Polaris catalog {} on source.",
            catalog.getName());
        
        try(IcebergCatalogService targetIcebergCatalogService = target.initializeIcebergCatalogService(catalog.getName())) {

          clientLogger.info(
              "Initialized Iceberg REST catalog for Polaris catalog {} on target.",
              catalog.getName());

          syncNamespaces(
              catalog.getName(), Namespace.empty(), sourceIcebergCatalogService, targetIcebergCatalogService);

        }

      } catch (Exception e) {
        if (haltOnFailure) throw new RuntimeException(e);
        clientLogger.error(
            "Failed to synchronize Iceberg REST catalog for Polaris catalog {}.",
            catalog.getName(),
            e);
        continue;
      }

@mansehajsingh mansehajsingh requested a review from adutra April 25, 2025 16:55
@mansehajsingh
Copy link
Contributor Author

@adutra Thanks for taking a look! The catalogs should be closed properly now!

Copy link

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @mansehajsingh !

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the review @pingtimeout & @dimas-b!

@eric-maynard eric-maynard merged commit ab310d5 into apache:main Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants