-
Notifications
You must be signed in to change notification settings - Fork 330
add refresh credentials property to loadTableResult #2341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add refresh credentials property to loadTableResult #2341
Conversation
bb55a80 to
eb3e360
Compare
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
5c92bdd to
99075da
Compare
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
|
Please address CI failures :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall 👍 Thanks for working on this @jasonf20 !
Do we have existing integration tests for the credential refresh endpoint? If not, could you add a test in RestCatalogMinIOSpecialIT (probably). I suppose we could use the test REST Client to validate refresh credential responses. I know this is specific to the S3 storage integration, but it will hit the common call path and we can validate Azure AccessConfig in unit tests. WDYT?
| public String credentialsPath(TableIdentifier ident) { | ||
| return SLASH.join( | ||
| "v1", | ||
| prefix, | ||
| "namespaces", | ||
| RESTUtil.encodeNamespace(ident.namespace()), | ||
| "tables", | ||
| RESTUtil.encodeString(ident.name()), | ||
| "credentials"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: relative URIs are correct in this case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from what I understand the recent version of the client automatically appends the base URI if a relative path is returned from the server.
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/cache/StorageCredentialCache.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/polaris/core/storage/gcp/GcpCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
| Boolean.class, | ||
| AwsClientProperties.REFRESH_CREDENTIALS_ENABLED, | ||
| "whether to enable automatic refresh of credentials", | ||
| true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this property is a credential property (logically)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the client code:
- The
AwsClientFactoryis initialized ahead of time with generic properties retrieved from the "configuration" endpoint. This doesn't include specific endpoints for credential refresh per table and/or access keys. - Later on when loading the fileIO per table, the base properties + additional table properties provided from the
List<Credential>are added here.
For the property to be available in Credential.config() down the line this needs to be set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code pointer! In this case, it does make sense to mark these properties as "credentials", but please add a code comment for future readers to be aware of what Iceberg java clients expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look at this. I think I was wrong since it also passes in tableConf which if set initializes a new fileIO with those properties, so this can be false. I'll update those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your diligence! 🙂
polaris-core/src/main/java/org/apache/polaris/core/storage/StorageAccessProperty.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/StorageAccessProperty.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/StorageAccessProperty.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/PolarisStorageIntegration.java
Outdated
Show resolved
Hide resolved
.../test/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegrationTest.java
Outdated
Show resolved
Hide resolved
|
@dimas-b I've added the integration test and unit test for azure as you suggested and made other fixes based on your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM in its current state 👍 ... with some minor comments 🙂
However, I believe it is worth sending a dev email about the new feature before merging (as commented below).
polaris-core/src/main/java/org/apache/polaris/core/storage/PolarisCredentialVendor.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/PolarisStorageIntegration.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/cache/StorageCredentialCache.java
Outdated
Show resolved
Hide resolved
| EnumSet<AccessDelegationMode> delegationModes, | ||
| String prefix, | ||
| TableIdentifier tableIdentifier) { | ||
| return delegationModes.contains(AccessDelegationMode.VENDED_CREDENTIALS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is ok from my POV, but I wonder if other community members prefer a feature flag to switch the new behaviour on/off.
Please send an email about this to the dev ML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll send the dev email myself :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM as well 👍 ! we are very close :) !
| AWS_REFRESH_CREDENTIALS_ENDPOINT( | ||
| String.class, | ||
| AwsClientProperties.REFRESH_CREDENTIALS_ENDPOINT, | ||
| "the endpoint to use for refreshing credentials", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we are not refreshing the credentials, we are getting brand new one how about we say something like
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L119
|
@jasonf20 : Could you add an entry about this change to |
…ndpoint is included
431de42 to
2d6dd03
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param allowedWriteLocations a set of allowed to write locations | ||
| * @param refreshCredentialsEndpoint an optional endpoint to use for refreshing credentials. If | ||
| * supported by the storage type it will be returned to the client in the appropriate | ||
| * properties. The endpoint may be relative to the base URI and the client is responsible for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we return anything except relative path ? my understanding is base uri is always catalog uri https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java#L92 is this statement in intentionally open ended since spec is not explicit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point the client required full paths, now it works with partial paths. I'm not sure if the spec identifies one as the correct method over the other so I left it like this. If one is definitely the "correct" way I can update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonf20 : do you know the Iceberg version / PR where relative paths started working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was linked in the previous PR. #1164 (comment)
Seems like it's since 1.8.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no this made a lot of things super tricky, i checked iceberg python for refresh keyword but i didn't any prs for that so i am assuming that it doesn't support, taking only the iceberg java sdk reference
if we always send relative url it only works for >= 1.8
fails for < 1.8
I know there is a header that the java sdk sends which has SDK version X-Client-Version header we may want to salvage this now on when to return the absolute vs relative, https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L77
seems like the only broken release would be 1.7
Its a bit surprising to me, but i think for now we can park this by just documenting this gotcha, WDYT @dimas-b ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting full URI is a bit trickier - as discussed in #1164.
From my POV, let's merge this "as is" and open a GH issue that it does not work properly for Iceberg < 1.8. When someone has time, let's improve.
Getting this feature enabled for Iceberg clients >= 1.8 is valuable, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since old client version behave differently, I think we ought to have a feature flag now... just in case someone runs into client incompatibility issues and wants to disable credential refresh properties completely (i.e. revert to old Polaris behaviour). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be ideal IMHO (specially now), but i think the jobs without us sending the credentials endpoint would still have failed because my understanding is we would have not made the /credentials call unless the creds in loadTable expires https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java#L63, its just we might see an HTTP error rather than the creds expired exception.
But agree, we might want to have a feature flag to fallback to the old behaviour if we unexpectedly fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a feature flag is needed.
- The client is already broken, if they reach the need to refresh credentials it already fails today.
- A feature flag will not solve the problem. Enabling it will fix the issue for some and create a different issue for others, so you'll never be able to enable it anymore than you can without the flag.
- Clients can simply upgrade to 1.8.0 to get a working version. Their current version doesn't work anyway so they can upgrade at their own time when this becomes important to them.
- If you would like we can remove the
refresh_credentials_enabledproperty and let the client set that one in his catalog settings. That way he can control if this gets used or not. This would be better than a flag and is controlled client side. not server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit that does 4.
|
Looks like we already have @dimas-b ;) review, we were parallely reviewing, we good to go from my end ! |
|
Oops, CI failure 😅 |
fixed |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading @jasonf20 comment, I think removing of the "enabled" properties from AccessConfig is a sound approach.
CHANGELOG.md
Outdated
| - Added refresh credentials endpoint configuration to LoadTableResponse for AWS, Azure, and GCP. Enabling | ||
| automatic storage credential refresh per table on the client side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets call out the client version limitation here and the suggestion to the caller to set the client side override to enable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good points!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the release is imminent, if the round trip cycle is slow, I suppose we can still merge this PR and later adjust the changelog between @singhpk234 and myself :) Let's give it a few more hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the changelog. Seems a little verbose now for a changelog but It's probably fine.
p.s. I believe you should be able to push changes to my branch yourself since I have the "Allow edits and access to secrets by maintainers" checkbox enabled. But I never tried using that myself.
CHANGELOG.md
Outdated
| automatic storage credential refresh per table on the client side. | ||
| automatic storage credential refresh per table on the client side. Java client version >= 1.8.0 is required. | ||
| The endpoint path is always returned when using vended credentials, but the client must configure | ||
| REFRESH\_ENDPOINT\_ENABLED = "true" in the catalog properties for the client to use the endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the properties are unfortunately different for the all the cloud providers, not sure if we wanna call this out, just saying enabled never the less refresh-credentials-enabled should be endpoint
how about we say that clients must enable refresh-credentials flag for their respective cloud provider ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's less verbose. I changed it to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 👍
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jasonf20 for all the effort !
Appreciate your contribution :), much needed capability !
|
Thanks for your reviews @dimas-b @singhpk234 ! |
| - Added refresh credentials endpoint configuration to LoadTableResponse for AWS, Azure, and GCP. Enabling | ||
| automatic storage credential refresh per table on the client side. Java client version >= 1.8.0 is required. | ||
| The endpoint path is always returned when using vended credentials, but clients must enable the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to document this. Unfortunately, the best place to doc this is a page for storage setting and credential vending, which doesn't exist yet. Here is the issue filed long time ago, #1325. Not a blocker for this PR though.
Rebased version of #1164