-
Notifications
You must be signed in to change notification settings - Fork 331
Add support for S3 request signing #2280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you so much for this @adutra ! when ready please let us know (would be really helpful if you can share some description to walk us through your thought process), happy to help with reviews :) ! |
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like good 1st step.
As all signing requests will have to access the backend for every little data file chunk being accessed by clients, this implementation will cause quite a slow experience for users. Until we don't have a faster implementation (signed access-rules), I think we should label this feature as "experimental".
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/polaris/service/storage/aws/signer/S3RequestSignerImpl.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/polaris/service/storage/aws/signer/S3RequestSignerImpl.java
Outdated
Show resolved
Hide resolved
| .method(method) | ||
| .headers(signingRequest.headers()); | ||
|
|
||
| // FIXME is this correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope.
See ListObjects + ListObjectsV2 REST spec.
Be careful: Not all GET /{bucket/ requests are list-objects requests!
The "v1" list-objects has no required parameter, so you cannot just check for the presence of a req param. It could also be a get-bucket-cors or get-bucket-encryption or ...-policy or ...-website or ......
Nice, no?
Maybe it's okay to assume that every client uses the v2 api, which has a unique, required request param.
...service/src/main/java/org/apache/polaris/service/storage/aws/signer/S3RequestSignerImpl.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/polaris/service/storage/aws/signer/S3RequestSignerImpl.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/polaris/service/storage/aws/signer/S3RequestSignerImpl.java
Outdated
Show resolved
Hide resolved
6bf40a9 to
8d1f354
Compare
df10af7 to
6a26337
Compare
|
Ready for review. The test failures are unrelated to this change. |
| mapOf( | ||
| "IcebergErrorResponse" to "org.apache.iceberg.rest.responses.ErrorResponse", | ||
| "S3SignRequest" to "org.apache.polaris.service.aws.sign.model.PolarisS3SignRequest", | ||
| "SignS3Request200Response" to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI The name of the logical response type is a bit weird because it is inlined, instead of being a $ref in the OpanAPI spec.
| private String currentCatalogName; | ||
| private Map<String, String> restCatalogConfig; | ||
| private URI externalCatalogBase; | ||
| private URI externalCatalogBaseLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I refactored this class a bit, notably to facilitate subclassing, but there is no functional change in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor request here, but you could pull this and the new IntegerationTest Helpers out into a separate PR and reduce the size of this PR by a bit. I think that would also be a very fast review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup good idea!
.../src/main/java/org/apache/polaris/service/it/test/PolarisS3RemoteSigningIntegrationTest.java
Show resolved
Hide resolved
runtime/server/build.gradle.kts
Outdated
| } | ||
|
|
||
| dependencies { | ||
| implementation(project(":polaris-core")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly referenced in this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another change I would try to pull out to another PR, it's not related to this changeset correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct:
| } | ||
| } | ||
|
|
||
| private static boolean isStaticFacade(CatalogEntity catalog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to become a method of CatalogEntity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer FYI:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It definitely helps to keep this sort of refactor out of the larger functional change
| * <p>The returned provider is not meant to be vended directly to clients, but rather used with | ||
| * STS, unless credential subscoping is disabled. | ||
| */ | ||
| default AwsCredentialsProvider awsSystemCredentials() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to awsSystemCredentials because these are not "just" credentials for STS, they are server credentials.
| prefix, | ||
| catalog -> { | ||
| S3SignResponse response = catalog.signS3Request(s3SignRequest, tableIdentifier); | ||
| return Response.status(Response.Status.OK).entity(response).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the response should include Cache-Control headers. This is left for a later improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but that would require clients to respect those 🤷
10b51bc to
11d2254
Compare
|
This is now ready again for review! \cc @snazy @singhpk234 @flyrain @dimas-b This PR is equivalent to Milestone 1 in the original design doc. Let's try to work incrementally and get this in, we are already working on getting a new PR for Milestone 2 and especially the authz checks that are not yet implemented. Thanks! |
|
Hey Alex, If yes, I am bit confused, if we know M2 is the one we eventually wanna make and M2 is a bit orthogonal to M1 why are making / investing on M1, a couple questions that would be really helpful to get your take :
|
It's not orthogonal, M2 builds on top of M1.
It depends if we want all the features in M2 or not. I would actually personally break M2 into smaller pieces. The most important piece is "Access checks based on the S3 location", which can be done in a few man-days. The "Nessie style" signed parameters would require a bit more than that, imo.
For the same reason we released the Events feature in incomplete state, with "beta" status: to get users feedback, and to work incrementally.
These are not "flavors" of remote signing. It's just one same feature, in two levels of maturity M1: experimental, M2: production-ready.
Yes, if we adopt Nessie's strategy for fast remote signing as proposed in the doc (it's optional). Note: this isn't a breaking change, as it's not user facing. Happy to discuss this topic more in depth in a Slack channel or during the next sync! |
11d2254 to
f33646b
Compare
|
The proposed phased approach (M1/M2) SGTM 👍 |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments 🙂 but the PR overall looks good 👍 Thanks for implementing this, @adutra !
| .addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig)) | ||
| .addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig)) | ||
| .addAll( | ||
| PolarisEndpoints.getSupportedRemoteSigningEndpoints( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth advertising in ConfigResponse because it's not a standard endpoint, for which clients might want to check the endpoints list. Clients learn about it via loadTable responses, I guess 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's remove.
| } | ||
|
|
||
| String prefix = prefixParser.catalogNameToPrefix(callContext.getRealmContext(), catalogName); | ||
| URI signerUri = uriInfo.getBaseUri().resolve("api/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the api/ constant... It related to our @Path annotations in the generated REST API classes, so it's usage is a bit obscure here. Could we make it a constant in the s3-sign-service module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I'm not sure a constant in s3-sign-service is the best solution. The api/ path segment doesn't seem to be a first-class citizen in our APIs, rather a bi-product. It's only declared as a server variable in the OpenAPI generator config, for example:
| serverVariables.put("basePath", "api/catalog") |
| serverVariables.put("basePath", "api/catalog") |
The management APi is actually a bit different, there is not basePath variable in polaris-management-service.yml, so the base path is hard-coded:
| - url: "{scheme}://{host}/api/management/v1" |
But the common characteristic is that api/is the first path segment of all Polaris APIs, but that is not enforced by any constraint.
I would say the right place for this segment would be in PolarisResourcePaths, but you will notice again that none of the paths include the api/ segment. Same for Iceberg's ResourcePaths. So it's really implicit 😅
All of this to say, let's declare a constant in PolarisResourcePaths and call it a day, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to PolarisResourcePaths
| RESTUtil.encodeString(ident.name())); | ||
| } | ||
|
|
||
| public String s3RemoteSigning(TableIdentifier ident) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a core concern? I suppose it is only relevant to the REST layer 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the same could be said for all the methods in this class 😄
The package is org.apache.polaris.core.rest btw - so I guess REST is a core concern 🤷♂️
I think PolarisResourcePaths and PolarisEndpoints should not live in core ideally. But that's the way it is today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I did not actually look at the whole class, only at the diff 😅
Let's keep it for now and maybe refactor later.... In general, I'd like to untangle core from REST 😅
| AWS_REMOTE_SIGNER_URI( | ||
| String.class, | ||
| S3V4RestSignerClient.S3_SIGNER_URI, | ||
| "the base URI for the remote signer service, used for signing S3 requests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have a reference Iceberg version number where the property becomes respected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg 1.2.0. That's more than 2 years old, I'm not sure we need to mention it here. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine... I was worries it might have been more recent ;)
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
f33646b to
cb7bd80
Compare
|
Hey Alex, |
| identifier, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.ICEBERG_TABLE, true); | ||
|
|
||
| // If the table doesn't exist, we still need to check allowed locations from the parent | ||
| // namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] is this a valid state ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When creating a new table, engines write the manifest files first, then commit the table. The manifest writes must be signed too. Thus, a request to sign can arrive at the server before the table is created.
CHANGELOG.md
Outdated
| 2. Grant the `TABLE_REMOTE_SIGN` privilege to a catalog role. The role must also be granted the `TABLE_READ_DATA` | ||
| and `TABLE_WRITE_DATA` privileges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] is there an assertion when even TABLE_REMOTE_SIGN is granted these two privieldges would be there ?
nit :
| 2. Grant the `TABLE_REMOTE_SIGN` privilege to a catalog role. The role must also be granted the `TABLE_READ_DATA` | |
| and `TABLE_WRITE_DATA` privileges. | |
| 2. Grant the `TABLE_REMOTE_SIGN` privilege to a catalog role. The catalog role must also be granted the `TABLE_READ_DATA` | |
| and `TABLE_WRITE_DATA` privileges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an assertion when even TABLE_REMOTE_SIGN is granted these two privieldges would be there ?
Currently no.
| } | ||
|
|
||
| String prefix = prefixParser.catalogNameToPrefix(callContext.getRealmContext(), catalogName); | ||
| URI signerUri = uriInfo.getBaseUriBuilder().path(PolarisResourcePaths.API_PATH_SEGMENT).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a TODO in code to handle proxies ?
| realmConfig.getConfig(FeatureConfiguration.SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION); | ||
|
|
||
| boolean credentialSubscopingAllowed = | ||
| baseCatalog instanceof IcebergCatalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if its an IcebergCatalog but doesn't support STS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question!
It's complex to check this at this level because you need to find the AwsStorageConfigurationInfo corresponding to the resolved path in order to know if STS is unavailable.
I think we shouldn't make this logic too complex, because it's not common for a client to request more than one access delegation mode anyways.
| private AccessDelegationMode selectAccessDelegationMode( | ||
| Set<AccessDelegationMode> delegationModes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does linking this part of spec https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1859 helps in putting some context that server is allowed to decide which one to choose ?
| public PolarisS3SignResponse signS3Request( | ||
| PolarisS3SignRequest s3SignRequest, TableIdentifier tableIdentifier) { | ||
|
|
||
| LOGGER.debug("Requesting s3 signing for {}: {}", tableIdentifier, s3SignRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to explicity log it here or quarkus supports Access log, asking because i don't see such logging in Iceberg Catalog Handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quarkus indeed supports access logs. We must be careful with logging here because this endpoint will be called very, very frequently.
| // table creation is committed. In this case, we still need to check allowed locations from | ||
| // the parent entities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the case of staged table ? i wondering if we should block on this case ? becasue it may happen the table is different from namespace location ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good question, let me check if we have tests for staged table creation + remote signing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigating a bit, it seems that staged vs direct doesn't affect remote signing.
However I found a few feature flags that could interfere with remote signing:
ALLOW_UNSTRUCTURED_TABLE_LOCATIONALLOW_EXTERNAL_TABLE_LOCATIONALLOW_EXTERNAL_METADATA_FILE_LOCATIONDEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED
I will try to add tests for these flags + remote signing.
| if (catalogEntity.isExternal()) { | ||
| throw new ForbiddenException("Remote signing is not enabled for external catalogs."); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we discussed this before for federated catalog we should disable remote signing but during that time we didn;t support cred vending for them either, i wonder if its a good time to reconsider that remote signing would be possible too (ofc in a new PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, I think we could discuss this possibility later on.
| if (baseCatalog.tableExists(tableIdentifier)) { | ||
|
|
||
| // If the table exists, get allowed locations from the table metadata | ||
| Table table = baseCatalog.loadTable(tableIdentifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each SIGN request a loadTable call is super expensive, if someone enables this in prod by accident for a table million of files the service might become unresponsive ?
[suggestion / optional to address] wondering for now if we should add a prod readiness check to keep this disabled for prod ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each SIGN request a loadTable call is super expensive
Indeed, that's where M2 will help because it will save us this loadTable call.
I will add a FIXME for now.
[suggestion / optional to address] wondering for now if we should add a prod readiness check to keep this disabled for prod ?
Good idea, will do 👍
| } | ||
|
|
||
| private Set<PolarisStorageActions> getStorageActions(PolarisS3SignRequest s3SignRequest) { | ||
| // TODO M2: better handling of DELETE and LIST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for my understanding] can you please elaborate this case more ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By just looking at the request URI, it's not easy to know if the request is a write or a delete, or if it is a read or a list.
For example: the ListObjects operation is conceptually a LIST operation, and GetObject is conceptually a READ. But both requests use the GET method, so it's not easy to disambiguate.
Another example: the DeleteObject operation uses the DELETE method, but the DeleteObjects operation uses... the POST method 🤷♂️
I will add some comments to clarify.
cc444a1 to
f3961b0
Compare
dbee61b to
6019f32
Compare
Fixes #32.
Design doc:
https://docs.google.com/document/d/1ygdia7u4bUHUt6n8XhZo48aKoIyyrCvKqan3XP25iB8/edit?usp=sharing