-
Notifications
You must be signed in to change notification settings - Fork 330
Add support for customer write.data.path and write.metadata.path with test for object store location provider #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for customer write.data.path and write.metadata.path with test for object store location provider #193
Conversation
|
We had to tackle the same problem (object-storage layout) in Nessie, but also wanted to retain the ability to control the table's location for data security reasons and generating IAM policies. Technically, there's sadly not only The solution in Nessie to the "object-storage layout problem" is: If |
By "right" A user should be able to set the |
|
@snazy I added an extra test verifying that object-store layout already works under the table's default location. I think this should work for anyone who really wants credential-vending to be specifically scoped to the table location, while the extra configuration allows for a less strict solution for use cases that want to trade credential vending for performance/scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good but I wonder if we need a way to add even more locations, the basic issue is this only represents the "latest" location that data or metadata could be written to and a user could have set a different value to these properties in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the purpose of the allowedLocations field of the storageConfiguration. Currently, we can't set table-level StorageConfiguration, but part of the reason for implementing the StorageConfigurationOverride is to help pave the way toward that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computeIfPresent May work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if Map had a ifPresent(Consumer<T>) method...
polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check Optional.ofNullable to simplify this ternary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scratch that, l is a list here, ... hmm
polaris-core/src/main/java/io/polaris/core/storage/PolarisStorageConfigurationInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like to avoid always doing a storage configuration override we are using the Optional path below, It just feels like it may be simpler to just always add Locs to the Config?
Or we could just do
if (userSpecifiedWriteLocations.notEmpty)
return Storage Override
else
configInfo
I'm not sure the functional style here makes it much more readable, One of those things that I think would be much cleaner in Scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I think we can just always add the StorageConfigurationOverride with the empty locs if not set
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a High level I think this is is a great idea, I do think we should probably reduce some of the functional approach here which I think is a little complicated for what we are trying to accomplish. If we were in Scala I feel like it would be very simple but in Java we have to do the List -> Option swap which I don't think is very easy to read.
I don't have a strong feeling about that though if other engineers want to go with this style everywhere in the project.
There are some references to "snowman" in the test file which I think we may need to change now, but I see there are other references there already so this isn't a deal breaker for me.
Also for future problems, we probably will need to consider arbitrary additional paths but i'm not sure how we can do that at the moment.
64e0624 to
0c32a7a
Compare
Description
Custom
write.data.pathandwrite.metadata.pathare currently blocked, preventing users from following the guidelines in https://iceberg.apache.org/docs/1.5.1/aws/#object-store-file-layout . This updates the blocking to allow for customized write paths with the existingALLOW_UNSTRUCTURED_TABLE_LOCATIONconfiguration flag. This also adds a test using the object store location provider feature.Fixes #113
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
New pytest for S3
Test Configuration:
Checklist:
Please delete options that are not relevant.