-
Notifications
You must be signed in to change notification settings - Fork 332
Service: Validate allowed locations from Catalog Storage Config #2473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| () -> { | ||
| List<String> allowedStorageTypes = | ||
| realmConfig.getConfig(FeatureConfiguration.SUPPORTED_CATALOG_STORAGE_TYPES); | ||
| if (!allowedStorageTypes.contains(StorageConfigInfo.StorageTypeEnum.FILE.name())) { | ||
| List<String> invalidLocations = | ||
| locations.stream() | ||
| .filter( | ||
| location -> location.startsWith("file:") || location.startsWith("http")) | ||
| .collect(Collectors.toList()); | ||
| if (!invalidLocations.isEmpty()) { | ||
| throw new ForbiddenException( | ||
| "Invalid locations '%s' for identifier '%s': File locations are not allowed", | ||
| invalidLocations, identifier); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is unchanged.
| validateLocations(realmConfig, List.of(parentLocation), locations, identifier); | ||
| } | ||
|
|
||
| validateLocations(realmConfig, allowedLocations, locations, identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure to validate the allowed locations regardless of ALLOW_UNSTRUCTURED_TABLE_LOCATION is true or false
| StorageUtil.getLocationsAllowedToBeAccessed( | ||
| null, entityPathReversed.get(0).getPropertiesAsMap()); | ||
| return new StorageConfigurationOverride( | ||
| return new LocationRestrictions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the TODO on line 160 added in https://github.com/apache/polaris/pull/1320/files#diff-607cdd8c6ea78443988359e42ffec3091a4c84bc9151575772ee3b90164e03f3 was correctly pointing out that the addition of these locations from StorageUtil.getLocationsAllowedToBeAccessed (formerly userSpecifiedWriteLocations) msimatched the intent of the validation in ways that were problematic for Views.
We should remove all this location-augmentation here and simply return the basic restrictions directly from configInfo in this else branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test createViewWithCustomMetadataLocationUsingPolaris() failed once we remove it, which was introduced by #1320. Do we need to support view with custom location(write.metadata.path)? I don't think it's necessary. We can probably revert all changes in #1320 if needed. Also, we can fix it in a followup PR. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry too much about edge cases like a custom write.metadata.path for views. In fact, even for tables, we all know that tables can contain data files outside of write.metadata.path U write.data.path U location right? We need to capture the majority of use layouts with credential vending, but not necessarily the long tail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, regardless of whether this allows or disallows setting write.metadata.path for Views, that test case is currently documenting the incorrect behavior that also applies to Tables.
Probably better to fix-forward the test; even #1320 itself indicated it was just documenting current observed behavior, but we can update the createViewWithCustomMetadataLocationUsingPolaris to ensure it also fails with ForbiddenException even if the custom path is in the parent.
In this case we also want to make sure even with ALLOW_UNSTRUCTURED_TABLE_LOCATION=true for the CreateTable scenario (not CreateView), the creation will fail if the "custom location" isn't part of the Catalog's allowedLocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, if we want a "success" case for Views calling super.createViewWithCustomMetadataLocation(); we can just construct the original Catalog's StorageConfig to include the custom path in allowedLocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
polaris-core/src/main/java/org/apache/polaris/core/storage/LocationRestrictions.java
Outdated
Show resolved
Hide resolved
|
|
||
| public LocationRestrictions( | ||
| @Nonnull PolarisStorageConfigurationInfo storageConfigurationInfo, | ||
| List<String> allowedLocations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the allowedLocations input here based on what I commente in PolarisStorageConfigurationInfo. The intended use of storageConfigurationInfo.getAllowedLocations() should be implicit.
If we really wanted to allow a UNION of configInfo.getAllowedLocations() with some additional alternativeAllowedLocations we should probably name it explicitly as such to make it very clear that we're changing the total set of allowedLocations rather than paring down a smaller subset of allowedLocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #2473 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
polaris-core/src/main/java/org/apache/polaris/core/storage/LocationRestrictions.java
Outdated
Show resolved
Hide resolved
|
|
||
| // TODO: figure out the purpose of adding `userSpecifiedWriteLocations` | ||
| Set<String> locations = | ||
| StorageUtil.getLocationsAllowedToBeAccessed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, maybe we should rename all the StorageUtil.getLocationsAllowedToBeAccessed methods to make it clear that it's not really telling what's "allowed", but really what's declared to be used by the table.
Maybe rename it to StorageUtil.getLocationsDeclaredForTable or StorageUtil.getLocationsRequestedForTable or StorageUtil.getAllLocationsUsedByTable or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| .createNamespace( | ||
| identifier.namespace(), | ||
| ImmutableMap.of( | ||
| IcebergTableLikeEntity.USER_SPECIFIED_WRITE_METADATA_LOCATION_KEY, location)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to have a write.metadata.path for namespace, also we won't check the locations used by parent after the change in method forEntityPath
| public void createViewWithCustomMetadataLocationUsingPolaris(@TempDir Path tempDir) { | ||
| TableIdentifier identifier = TableIdentifier.of("ns", "view"); | ||
|
|
||
| String location = Paths.get(tempDir.toUri().toString()).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This location still escape from the storage config. We will need more fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class does add the @TempDir to allowedLocations in
Line 43 in f555f30
| protected StorageConfigInfo getStorageConfigInfo() { |
Is this a different tempdir that escapes? If we want to exercise a failure we probably need to try to set the location to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they are different tmpDir, the paths generated are different. So the namespace location should not be allowed, not sure why it still passed. I will debug it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. It works as expected when I moved the integration test to unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the Polaris added a structured location to the view in #1966. When a rest client creates view with location A, the Polaris will automatically adds the prefix like /root/namespace/A, related code
polaris/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java
Line 1244 in 17a359b
| return super.withLocation(transformTableLikeLocation(identifier, newLocation)); |
The unit tests I added don't have the issue as it doesn't go through that route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, the location checking is fine. I will merge this PR, and file another to fix the view issue.
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Integration tests for Catalog Federation (apache#2344) Adds a Junit5 integration test for catalog federation. * Fix merge conflict in CatalogFederationIntegrationTest (apache#2420) apache#2344 added a new test for catalog federation, but it looks like an undetected conflict with concurrent changes related to authentication have broken the test in main. * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.23-6.1755674729 (apache#2416) * 2334 (apache#2427) * Fix TableIdentifier in TaskFileIOSupplier (apache#2304) we cant just convert a `TaskEntity` to a `IcebergTableLikeEntity` as the `getTableIdentifier()` method will not return a correct value by using the name of the task and its parent namespace (which is empty?). task handlers instead need to pass in the `TableIdentifier` that they already inferred via `TaskEntity.readData`. * Fix NPE in CreateCatalog (apache#2435) * Doc fix: Access control page update (apache#2424) * 2418 * 2418 * fix(deps): update dependency software.amazon.awssdk:bom to v2.32.29 (apache#2443) * Optimize PolicyCatalog.listPolicies (apache#2370) this is a follow-up to apache#2290 the optimization is to use `listEntities` instead of `loadEntities` when there is `policyType` filter to apply * Add PolarisDiagnostics field to BaseMetaStoreManager (apache#2381) * Add PolarisDiagnostics field to BaseMetaStoreManager the ultimate goal is removing the `PolarisCallContext` parameter from every `PolarisMetaStoreManager` interface method, so we make steps towards reducing its usage first. * Add feature flag to disallow custom S3 endpoints (apache#2442) * Add new realm-level flag: `ALLOW_SETTING_S3_ENDPOINTS` (default: true) * Enforce in `PolarisServiceImpl.validateStorageConfig()` Fixes apache#2436 * Deprecate ActiveRolesProvider for removal (apache#2404) * Client: fix openapi verbose output, remove doc generate, and skip test generations (apache#2439) * Fix various issue in client code generation * Use logger instead of print * Add back exclude on __pycache__ as CI is not via Makefile * Add back exclude on __pycache__ as CI is not via Makefile * Add user principal tag in metrics (apache#2445) * Added API change to enable tag * Added test * Added production readiness check * fix(deps): update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.36.0 (apache#2454) * fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.56.0 (apache#2447) * fix(deps): update dependency gradle.plugin.org.jetbrains.gradle.plugin.idea-ext:gradle-idea-ext to v1.3 (apache#2428) * Build: Make jandex dependency used for index generation managed (apache#2431) Also allows specifying the jandex index version for the build. This is a preparation step contributing to apache#2204, once a jandex fix for reproducible builds is available. Co-authored-by: Alexandre Dutra <[email protected]> * Built: improve reproducible archive files (apache#2432) As part of the effort for apache#2204, this change fixes a few aspects around reproducible builds: Some Gradle projects produce archive files, but don't get the necessary Gradle archive-tasks settings applied: one not-published project but also the tarball&zip of the distribution. This change moves the logic to the new build-plugin `polaris-reproducible`. Another change is to have some Quarkus generated jar files adhere to the same conventions, which are constant timestamps for the zip entries and a deterministic order of the entries. That's sadly not a full fix, as the classes that are generated or instumented by Quarkus differ in each build. Contributes to apache#2204 * Remove commons-lang3 dependency (apache#2456) outside of tests we can replace the functionality with jdk11 and guava. also stop using `org.assertj.core.util` as its a non-public api. * add refresh credentials property to loadTableResult (apache#2341) * add refresh credentials property to loadTableResult * IcebergCatalogAdapterTest: Added test to ensure refresh credentials endpoint is included * delegate refresh credential endpoint configuration to storage integration * GCP: Add refresh credential properties * fix(deps): update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.37.0 (apache#2458) * Add Delegator to all API Implementations (apache#2434) Per the Dev ML, implements the Delegator pattern to add Events instrumentation to all Polaris APIs. * Prefer java.util.Base64 over commons-codec (apache#2463) `java.util.Base64` is available since java8 and we are already using it in a few other spots. in a follow-up we might be able to get rid of our `commons-codec` dependency completely. * Service: Move tests to the right package (apache#2469) * Update versions in runtime LICENSE and NOTICE (apache#2468) * fix(deps): update dependency com.adobe.testing:s3mock-testcontainers to v4.8.0 (apache#2475) * fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9.1.0 (apache#2476) * Service: Remove hadoop-common from polaris-runtime-service (apache#2462) * Service: Always validate allowed locations from Storage Config (apache#2473) * Add Community Sync Meeting 20250828 (apache#2477) * Update dependency software.amazon.awssdk:bom to v2.33.0 (apache#2483) * Remove PolarisCallContext.getDiagServices (apache#2415) * Remove PolarisCallContext.getDiagServices usage * Remove diagnostics from PolarisCallContext * Feature: Expose resetCredentials via a new reset api to allow root user to reset credentials for an existing principal with custom values (apache#2197) * Add type-check to PolarisEntity subclass ctors (apache#2302) currently one can freely "cast" any `PolarisEntity` to a more specific type via their constructors. this can lead to subtle bugs like we fixed in a29f800 by adding type checks we discover a few more places where we need to be more careful about how we construct new or handle existing entities. note that we can add a check for `PolarisEntitySubType` in a followup, but it requires more fixes currently. * Fix CI (apache#2489) Fix undetected merge conflict after apache#2197 + apache#2415 + apache#2434 * Use local diagnostics in TransactionWorkspaceMetaStoreManager * Add resetCredentials to PolarisPrincipalsEventServiceDelegator * Core: Prevent AIOOBE for negative codes in PolarisEntityType, PolarisPrivilege, ReturnStatus (apache#2490) * feat(idgen): Start Implementation of NoSQL with the ID Generation Framework (apache#2131) Create an ID Generation Framework. Related to apache#650 & apache#844 Co-authored-by: Robert Stupp <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]> * perf(refactor): optimizing JdbcBasePersistenceImpl.listEntities (apache#2465) - Reduced Column Selection: Only 6 columns instead of 16 - Eliminated Object Creation Overhead: Direct conversion to EntityNameLookupRecord without intermediate PolarisBaseEntity * Add Polaris Events to Persistence (apache#1844) * AWS CloudWatch Event Sink Implementation (apache#1965) * Fix failing CI (apache#2498) * Update actions/stale digest to 3a9db7e (apache#2499) * Core: Prevent AIOOBE for negative policy codes in PredefinedPolicyType (apache#2486) * Service: Add location tests for views (apache#2496) * Update docker.io/jaegertracing/all-in-one Docker tag to v1.73.0 (apache#2500) * Update dependency io.netty:netty-codec-http2 to v4.2.5.Final (apache#2495) * Update actions/setup-python action to v6 (apache#2502) * Update the Release Guide about the Helm Chart package (apache#2179) * Update the Release Guide about the Helm Chart package * Update release-guide.md Co-authored-by: Pierre Laporte <[email protected]> * Add missing commit message * Whitespace * Use Helm GPG plugin to sign the Helm chart * Fix directories during Helm chart copy to SVN * Add Helm index to SVN * Use long name for svn checkout * Ensure the Helm index is updated after the chart is moved to SVN dist release * Do not publish any Docker image before the vote succeeds * Typos * Revert "Do not publish any Docker image before the vote succeeds" This reverts commit 5617e65. * Don't mention Helm values.yaml in the release guide as it doesn't contain version details --------- Co-authored-by: Pierre Laporte <[email protected]> * Update dependency com.azure:azure-sdk-bom to v1.2.38 (apache#2503) * Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.23-6.1756793420 (apache#2504) * Remove commons-codec dependency (apache#2474) follow-up to f8ad77a we can simply use guava instead and eliminate the extra dependency * CLI: Remove SCRIPT_DIR and default config location to user home (apache#2448) * Remove readInternalProperties helpers (apache#2506) the functionality is already provided by the `PrincipalEntity` * Add Events for Generic Table APIs (apache#2481) This PR adds the Events instrumentation for the Generic Tables Service APIs, surrounding the default delegated call to the business logic APIs. * Disable custom namespace locations (apache#2422) When we create a namespace or alter its location, we must confirm that this location is within the parent location. This PR introduces introduces a check similar to the one we have for tables, where custom locations are prohibited by default. This functionality is gated behind a new behavior change flag `ALLOW_NAMESPACE_CUSTOM_LOCATION`. In addition to allowing us to revert to the old behavior, this flag allows some tests relying on arbitrarily-located namespaces to pass (such as those from upstream Iceberg). Fixes: apache#2417 * fix for IcebergAllowedLocationTest (apache#2511) * Remove unused config from SparkSessionBuilder (apache#2512) Tests pass without it. * Add Events for Policy Service APIs (apache#2479) * Remove PolarisTestMetaStoreManager.jsonNode helper (apache#2513) * Update dependency software.amazon.awssdk:bom to v2.33.4 (apache#2517) * Update dependency com.nimbusds:nimbus-jose-jwt to v10.5 (apache#2514) * Update dependency io.opentelemetry:opentelemetry-bom to v1.54.0 (apache#2515) * Update dependency io.micrometer:micrometer-bom to v1.15.4 (apache#2519) * Port missed OSS change * NoSQL: adopt to updated test packages * NoSQL: adapt to removed PolarisDiagnostics param * NoSQL: fix libs.versions.toml * NoSQL: include jandex plugin related changes from OSS * NoSQL: changes for delete/set principal client-ID+secret * Last merged commit c6176dc --------- Co-authored-by: Pooja Nilangekar <[email protected]> Co-authored-by: Eric Maynard <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Christopher Lambert <[email protected]> Co-authored-by: Honah (Jonas) J. <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: fivetran-kostaszoumpatianos <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> Co-authored-by: Yufei Gu <[email protected]> Co-authored-by: JB Onofré <[email protected]> Co-authored-by: fivetran-arunsuri <[email protected]> Co-authored-by: Adam Christian <105929021+adam-christian-software@users.noreply.github.com> Co-authored-by: Artur Rakhmatulin <[email protected]> Co-authored-by: Pierre Laporte <[email protected]>
We will need to validate allowed locations from Catalog Storage Config in case of structured table location path.