Skip to content

Conversation

@XJDKC
Copy link
Member

@XJDKC XJDKC commented Oct 4, 2025

Milestones

This is Part 4 of the [Splitting] Initial SigV4 Auth Support for Catalog Federation. Upcoming parts will build on this system:

Introduction

This PR introduces a flexible credential management system for Polaris. Building on Part 3's service identity management, this system combines Polaris service identities with user-provided authentication parameters to generate credentials for remote catalog access.
The core of this PR is the new ConnectionCredentialVendor interface, which:

  • Generates connection credentials by combining service identity with user auth parameters
  • Supports different authentication types (AWS SIGV4, AZURE Entra, GCP IAM) through CDI, currently only supports SigV4.
  • Provides on-demand credential generation
  • Enables easy extension for new authentication types

In the long term, we should move the storage credential management logic out of PolarisMetastoreManager, PolarisMetastoreManager should only provide persistence interfaces.

Key Components

  1. ConnectionCredentialVendor
    The central interface that defines how connection credentials are generated. It combines Polaris service identities with user-provided authentication parameters to produce the final credentials needed for remote catalog access. This interface serves as the foundation for all authentication type-specific implementations.

  2. DefaultPolarisCredentialManager
    The primary implementation that orchestrates credential generation. It uses CDI to automatically select the appropriate vendor based on authentication type, delegates the credential generation process, and provides consistent error handling. This class integrates with the ServiceIdentityProvider from Part 3 to resolve Polaris service identities.

  3. SigV4ConnectionCredentialVendor
    A reference implementation for AWS SigV4 authentication. It handles the complex process of resolving AWS IAM credentials from service identities, performing AWS STS AssumeRole operations, and generating temporary credentials for catalog access. This implementation demonstrates best practices for vendor development.

  4. @SupportsAuthType
    A CDI qualifier annotation that marks vendor implementations with their supported authentication types. This enables automatic vendor selection at runtime while maintaining type safety. The qualifier is essential for the CDI-based pluggable architecture.

Future Improvements

  • Add credential caching support: provide a unified cache for storage creds and connection creds
  • Migrate storage creds to this new framework

@XJDKC XJDKC changed the title [DRAFT] SigV4 Auth Support for Catalog Federation - Part 4: Connection Credential Manager SigV4 Auth Support for Catalog Federation - Part 4: Connection Credential Manager Oct 6, 2025
@XJDKC XJDKC marked this pull request as ready for review October 6, 2025 15:28
@HonahX HonahX self-requested a review October 6, 2025 15:43

@Override
public @Nonnull ConnectionCredentials getConnectionCredentials(
@Nonnull ConnectionConfigInfoDpo connectionConfig) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we may embed creds cache in the credential manager, if we can't find a valid entry in the cache, we will select the correct vendor and delegate the credential retrieval to the credential vendor (for both storage and connection).

In this way, we can deprecate the old flow (using metastore to get the subscoped creds).

Since we need to generate cache key and may need realm id and the catalog id, wandering if we should pass in the whole catalog entity here. But we can refactor this later once we add cache support for credential manager.

cc: @dimas-b @dennishuo @HonahX

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 👍 Thanks @XJDKC ! Some comments below... mostly about CDI.

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

AssumeRoleRequest.Builder requestBuilder =
AssumeRoleRequest.builder()
.roleArn(sigv4Params.getRoleArn())
.roleSessionName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is probably fine for most use cases since the assumeRole session credential doesn't ever (normally) leave the Catalog server itself (in contrast to StorageIntegration ones where we need to vend it out), we should probably still find a way to provide a hook for adding a sessionPolicy here.

Maybe at least a TODO. A "simple" approach would be to allow configuring a whole service-level scoping policy string that would rely on wildcards to share across catalogs/realms.

More advanced would be to derive the policyString from a combination of the Type in the connectionConfig, and the service-level configuration.

If we ever do refactor to merge this for Catalog and Storage we at least would need to abstract out the sessionPolicy generation

Copy link
Member Author

@XJDKC XJDKC Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, since we don't vend / return this credentials to the client, so it should be fine.

For the simple approach, we need to use different service-level scoping policy for different services (based on the sigv4 signing name). e.g., if the service is

  • glue we need to get glue:ListTables, ``glue:ListDatabases`, e.t.c,
    • We may also need some privileges on Lake Formation so that we can get the vended credentials, if we want to use the vended credentials to load table metadata rather than using the storage config.
  • execute-api (api gateway), we need to use execute-api:Invoke
  • s3tables: we need another set of privileges.

I will add a TODO for now since I need to spend some time to figure out the correct policy string for each service. We may need a refactor on the policy generation.

String.class, AwsProperties.REST_SECRET_ACCESS_KEY, "the aws access key secret", true),
AWS_SESSION_TOKEN(
String.class, AwsProperties.REST_SESSION_TOKEN, "the aws scoped access token", true),
EXPIRATION_TIME(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this to be an enum value? I do not see any real usage of it 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just follow the StorageAccessProperty. If we want to remove CatalogAccessProperty, we should also deprecate StorageAccessProperty later.

I think initially, the reason we add the StorageAccessProperty is because we want all the storage credentials related properties must be explicitly defined in the num class, otherwise, AccessConfig is just an open-end property map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageAccessProperty.EXPIRATION_TIME is actually referenced in code, while this one is not 🤔

Copy link
Member Author

@XJDKC XJDKC Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageAccessProperty.EXPIRATION_TIME is also not referenced in the callee, we just use it to add expiration_time property to the AccessConfig in AzureCredentialsStorageIntegration and AwsCredentialStorageIntegration (the equivalent of ConnectionCredentialVendor)

Copy link
Member Author

@XJDKC XJDKC Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another usage of this enum class is:
It's a bridge between polaris and iceberg sdk. Polaris can understand this enum class, but we need to convert it to the property that iceberg sdk can understands.
Considering this, I prefer to keep it as it's.

i.e., Inside polaris, we will always use this enum class to refer the property, when we need to use iceberg sdk, like RESTCatalog and FileIO, we will convert the property to string-based property that iceberg sdk can understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we end up using the same expiration-time between different auth types in the future?

I recall StorageAccessProperty.EXPIRATION_TIME being more of the deprecated field, where we prefer AWS_SESSION_TOKEN_EXPIRES_AT_MS to be more specific about what it is. That one also has the s3 prefix: s3.session-token-expires-at-ms.

Is there ever a situation where we might need to put an S3 credential into the same config map as a sigv4 catalog-connection credential?

In that case we'd need separate expiration times as well if they're not in separate tuples. Following the same convention of the rest. prefix used in AwsProperties, we'd then want this to be rest.session-token-expires-at-ms.

}

// Delegate to the vendor to generate credentials
return selectedVendor.get().getConnectionCredentials(connectionConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since DefaultPolarisCredentialManager multiplexes over type-specific lower level implementations, having them under the same java interface (ConnectionCredentialVendor) does not sound right to me.

If a ConnectionCredentialVendor is impl. it type-specific, then DefaultPolarisCredentialManager should not be a sub-type of ConnectionCredentialVendor because it is not type-specific.

All calls are made via the PolarisCredentialManager interface.

WDYT about detaching PolarisCredentialManager from ConnectionCredentialVendor but keeping current DefaultPolarisCredentialManager code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it might look a bit confusing right now since PolarisCredentialManager doesn't define additional methods and everything still lives under ConnectionCredentialVendor.

The main reason I introduced the PolarisCredentialManager interface early is to establish a clear separation of concerns before we add StorageCredentialVendor and potentially other vendor types. The goal is for PolarisCredentialManager to act as the central entry point for all credential management (including caching and multiplexing), while ConnectionCredentialVendor represents just one of several possible vendor implementations. i.e., The PolarisCredentialManager is also a Connection Credential Vendor.

So even though it looks slightly redundant in this PR, it helps set up a cleaner, more extensible structure for the upcoming changes and avoids unnecessary refactoring later. I'm happy to add a short comment or javadoc note to clarify that intent if it helps.

I'd prefer to keep PolarisCredentialManager implement ConnectionCredentialVendor for now.
But if you feel strongly about separating it, I can also duplicate the method in PolarisCredentialManager and make it a standalone interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that in the future PolarisCredentialManager will extend ConnectionCredentialVendor and some other AbcCredentialVendor and DefCredentialVendor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we want to migrate the storage credential related to this new framework, we need to make PolarisCredentialManager extends StorageCredentialVendor as well.

interface PolarisCredentialManager implements ConnectionCredentialVendor, StorageCredentialVendor {}

class DefaultPolarisCredentialManager implements PolarisCredentialManager {
  private AccessCredsCache accessCredsCache;
  private final Instance<ConnectionCredentialVendor> connectionCredentialVendors;
  private final Instance<StorageCredentialVendor> storageCredentialVendors;

  @Override
  public @Nonnull ConnectionCredentials getConnectionCredentials(
      @Nonnull ConnectionConfigInfoDpo connectionConfig) {
    // 1. generate cache key
    // 2. check if there is a valid connection credential in the cache
    // 3. if yes, get the credential from cache and return it directly
    // 4. If not, select the credential vendor based on Auth Type
    // 5. Delegate to the connection credential vendor to fetch the connection credentials
    // 6. Store the connection credentials in the cache
    // 7. return the connection credentials
  }

  @Override
  public @Nonnull StorageAccessConfig getSubscopedStorageCreds(
      boolean allowListOperation,
      @Nonnull Set<String> allowedReadLocations,
      @Nonnull Set<String> allowedWriteLocations,
      Optional<String> refreshCredentialsEndpoint) {
    // 1. generate cache key
    // 2. check if there is a valid storage credential in the cache
    // 3. if yes, get the credential from cache and return it directly
    // 4. If not, select the credential vendor based on Storage Type
    // 5. Delegate to the storage credential vendor to fetch the storage credentials
    // 6. Store the storage credentials in the cache
    // 7. return the storage credentials
  }
}

Copy link
Contributor

@dimas-b dimas-b Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that approach is fine from my POV, except that I do not think PolarisCredentialManager is necessary as an interface... Why not implement all of those "leaf" interfaces in the DefaultPolarisCredentialManager class?

Copy link
Member Author

@XJDKC XJDKC Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but that's why there are some ambiguities: take a look into the code I posted above, if we remove PolarisCredentialManager and define DefaultPolarisCredentialManager like that, we have SigV4ConnectionCredentialVendor and DefaultPolarisCredentialManager and both of them inherit ConnectionCredentialVendor. If the caller uses ConnectionCredentialVendor as the variable type, which object the caller will get. DefaultPolarisCredentialManager can get the correct one because it will select the candidate based on the authentication type, but if we just define @Inject ConnectionCredentialVendor credentialVendor, the CDI framework can't tell which one should be used.

PolarisCredentialManager is the multiplexing layer, the idea is that, for the caller, if you need a connection credential, or a storage credential, just use CDI to get a PolarisCredentialManager, pass in the needed info, then PolarisCredentialManager will handle all the things for you, e.g., forwarding the calls to the actual credential vendor, cache the credentials. So, we can't directly inject ConnectionCredentialVendor, we should use a separate interface, and PolarisCredentialManager is the one we are looking for.

Copy link
Contributor

@dimas-b dimas-b Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller uses ConnectionCredentialVendor as the variable type, which object the caller will get [...]

This can be resolved in CDI with the help of "qualifiers". Simple callers get the "default" impl., e.g. the cache / multiplexer, while the cache can further qualify the next level of implementation via @AuthType.

Copy link
Member Author

@XJDKC XJDKC Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can't be resolved in CDI, but why do we make things very complicated.

With this approach, when we want to get the DefaultPolarisCredentialManager, we need to filter based on Qualifier.
When we want to get the specific vendor, we need to filter based on @AuthType.

  @Produces
  @RequestScoped
  public ConnectionCredentialVendor connectionCredentialVendor(
      PolarisCredentialManagerConfiguration config,
      @Any Instance<ConnectionCredentialVendor> credentialManagers) {
    return (ConnectionCredentialVendor) credentialManagers.select(Identifier.Literal.of(config.type())).get();
  }

  @Produces
  @RequestScoped
  public StorageCredentialVendor storageCredentialVendor(
      PolarisCredentialManagerConfiguration config,
      @Any Instance<StorageCredentialVendor> credentialManagers) {
    return (StorageCredentialVendor) credentialManagers.select(Identifier.Literal.of(config.type())).get();
  }

Why not just add a new interface that implements all the credential vendors?
That way, whenever we need credentials, we can simply use PolarisCredentialManager. It's cleaner and doesn't introduce any downside.

If vendors want to provide their own DefaultPolarisCredentialManager, they don't need to understand the underlying mechanism and also follow this pattern. (i.e., providing two producers and return different vendor type).

And the callers also don't need to understand that the one they are using is not the specific ConnectionCredentialVendor but the DefaultPolarisCredentialManager which has the cache logic and the multiplexing logic.

Copy link
Contributor

@dimas-b dimas-b Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revisit later :) I think current code is functional and clear enough for maintenance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, happy to discuss this further if needed!

@dimas-b
Copy link
Contributor

dimas-b commented Oct 7, 2025

It feels like we're getting into somewhat hypothetical and a bit too abstract discussions here 🙂 I do not want to block progress on the Catalog Federation feature, so I'd be ok with merging this PR "as is" if other reviewers more familiar with the feature and intended final state of the code approve.

@XJDKC
Copy link
Member Author

XJDKC commented Oct 7, 2025

It feels like we're getting into somewhat hypothetical and a bit too abstract discussions here 🙂 I do not want to block progress on the Catalog Federation feature, so I'd be ok with merging this PR "as is" if other reviewers more familiar with the feature and intended final state of the code approve.

Thanks Dmitri, the runtime logic can always be evolved and refined later. There are still many things we need to do for catalog federation, so we can continue the discussion about the interface in future PRs!

cc: @dennishuo @HonahX

HonahX
HonahX previously approved these changes Oct 7, 2025
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dimas-b for the thorough review on this! I think we've had a lot constructive and thoughtful discussions. The remaining points, as you mentioned, are more forward-looking points could be addressed when we actually add those features/refactorings.

With this and the fact that catalog federation is well-guarded by a param, I vote +1 on this to support federate to Glue IRC

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Oct 7, 2025
Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just a comment about the naming of the expiration-time property. We could fixup in a followup if needed

String.class, AwsProperties.REST_SECRET_ACCESS_KEY, "the aws access key secret", true),
AWS_SESSION_TOKEN(
String.class, AwsProperties.REST_SESSION_TOKEN, "the aws scoped access token", true),
EXPIRATION_TIME(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we end up using the same expiration-time between different auth types in the future?

I recall StorageAccessProperty.EXPIRATION_TIME being more of the deprecated field, where we prefer AWS_SESSION_TOKEN_EXPIRES_AT_MS to be more specific about what it is. That one also has the s3 prefix: s3.session-token-expires-at-ms.

Is there ever a situation where we might need to put an S3 credential into the same config map as a sigv4 catalog-connection credential?

In that case we'd need separate expiration times as well if they're not in separate tuples. Following the same convention of the rest. prefix used in AwsProperties, we'd then want this to be rest.session-token-expires-at-ms.

@HonahX
Copy link
Contributor

HonahX commented Oct 8, 2025

Thanks @XJDKC @dennishuo @dimas-b for reviewing! Given the size of the PR I propose to do non-urgent small changes in follow-up for easier reviewing process.

@HonahX HonahX merged commit 519e127 into apache:main Oct 8, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Oct 8, 2025
@XJDKC
Copy link
Member Author

XJDKC commented Oct 8, 2025

Generally LGTM, just a comment about the naming of the expiration-time property. We could fixup in a followup if needed

Good catch! I also looked into the Iceberg SDK, unfortunately, there isn't currently a property that indicates the expiration time of the token/credential for remote catalog access. So for now, I reused the expiration-time property, which is already used by StorageAccessProperty.

I can modify this in a follow up PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants