-
Notifications
You must be signed in to change notification settings - Fork 332
Adjustable limit on the number of locations per storage config #1068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjustable limit on the number of locations per storage config #1068
Conversation
...-core/src/main/java/org/apache/polaris/core/storage/azure/AzureStorageConfigurationInfo.java
Show resolved
Hide resolved
| PolarisConfiguration.<Integer>builder() | ||
| .key("STORAGE_CONFIGURATION_MAX_LOCATIONS") | ||
| .description("How many locations can be associated with a storage configuration") | ||
| .defaultValue(20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a limit in the catalog level if a table can only be possible with three locations specified by properties location, write.data.path, and write.metadata.path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't use this value for those locations; this is a limit on the allowed-locations a catalog can have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't express it clearly originally. My question is about why we need a limit on the allowed-locations of a catalog? Per our off-line discussion, IIUC, it's needed for table level policy used by credential vending so that the policy text won't be too long to exceed certain limit. Given that table locations are limited even there could be a large number of allowed-locations of its catalog, it seems not an issue, and no reason to put a limit. Am I understanding correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I would be okay with removing a limit totally, but on the other hand @collado-mike is mentioning the possibility of having 3 limits. I think preserving at least one limit, so we have the concept of a limit, may be helpful in the future especially if we do push storage configs down to the table level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with storage configs pushing down to table level, the chance of a unbounded number of table location is quite small.
- For majority Iceberg use cases within Polaris, writers will only possible to use three locations specified by properties
location,write.data.path, andwrite.metadata.path. - For migration use case, admittedly it is possible that are more than 3 locations mentioned above. However, users should be aware of the number of locations while migration, and add them to the table-level storage configs. At that time, we can enforce it by saying "that's too many locations, credential vending won't work." The limit seems better at table-level, as locations from different tables may not overlap.
In short, a limit at catalog level doesn't seem necessary now, and may not be effective in the future. I'd consider to remove it. But I'm open to be convinced by other use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go with no limit. I think the current users should be fine as we relax the limit by doing so. Nothing should be broken. If users start to ask for a limit due to whatever reason, then we can think of adding a limit config at that time. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR to simply have no config for now 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of best practice, it is very nearly always a good idea to make behavior changes behind a config flag. It's very easy to add a config that allows replicating the exact behavior we see today and then to remove that config if users are happy with the proposed change. What's not easy to do is to bring that behavior back to existing deployments once the code is ripped out. Making small, incremental changes to ensure we understand the unintended side effects seems pretty uncontroversial to me. I think Yufei's suggestion to keep the config but mark it deprecated is reasonable and allows us the opportunity to make changes carefully. Why is that controversial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so is everyone okay with 1 config now? If so, I will go ahead with restoring the PR to its state before this commit.
I'm not sure what exactly it means to mark a config as "deprecated", but we cannot easily remove a config once we've added one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to 'disagree and commit' to support one config. but I think we ought to at least have one config that defaults to the current behavior and allow users to increase as they see fit.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, as a followup, we can provide an error message while the policy/rule exceeds the length, we may need to change the method getSubscopedCreds() a bit.
|
Looks like this unit test failed, |
|
@eric-maynard Thanks for the changes. I have a question with respect to skipping the configuration of allowed locations and prefix validation. Is this in scope for this PR and if so does this not require any change in the prefix validation logic? If I am not mistaken we need handling in InMemoryStorageIntegration:validateSubpathsOfAllowedLocations |
|
Holding this for a moment to potentially rebase onto #1124 |
|
@eric-maynard are you ok to merge this one now, then rebase on #1124 after we have approval from the other reviewers? |
|
Unfortunately a rebase onto #1124 would change the flag from a feature flag to a behavior flag |
polaris-core/src/main/java/org/apache/polaris/core/PolarisConfigurationStore.java
Outdated
Show resolved
Hide resolved
| validateMaxAllowedLocations(MAX_ALLOWED_LOCATIONS); | ||
| validateMaxAllowedLocations( | ||
| PolarisConfigurationStore.getConfiguration( | ||
| PolarisConfiguration.STORAGE_CONFIGURATION_MAX_LOCATIONS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If validation fails, it will cause a runtime error, making this storage configuration unusable... This seems risky to me because storage config is persisted, but the PolarisConfigurationStore.getConfiguration() value can change from run to run (it may even be different is different servers in the same cluster). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this concern was raised in the discussion above where @flyrain advocated for not having a config at all, but @collado-mike felt strongly that we needed one. In the case that the flag is set to e.g. 100, someone creates a storage config with 99 paths, and then the flag is lowered to 10 then yes runtime exceptions will start happening on the service as soon as the storage config is created in-memory.
This fragility is one reason why I'm trying (through #1124) to make this a behavior-change flag rather than a fully-supported feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this kind of enforcement is valuable on storage config creation (and changes), but not on load (regardless of how we present the config flag itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the behavior here with a limit set is far from ideal.
I think this will be more clear once we close the discussion on #1124, but in essence this code should never be hit unless the user has touched a flag that's described there as:
... These flags control subtle behavior adjustments and bug fixes, not user-facing catalog settings. They are intended or internal use only, are inherently unstable, ...
Going forward with these flags, I think we should consider that sometimes the behavior you get with the flag set to a non-default value is a poor or even broken experience. The only reason the problematic code (this entire check, in this case) is left in place is due to some perceived risk or potential regression associated with removing it entirely. Eventually, we will want to remove it entirely. #1124 says of this:
Flags here are generally short-lived and should either be removed or promoted to stable feature flags before the next release.
In accordance with this guidance, this entire check should go away before the next release at which point in time I think the concern you raise above would be addressed. In the interim, we will be able to optionally switch back to the exact semantics in place before this change (the intent of a behavior change flag). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context, my concern is not about how configuration works or how configuration is described. My concern is with runtime behaviour in case the new config settings make storage config validation fail.
Whether the admin user had enough warning about risks is irrelevant, IMHO, because the storage config in question is present in the database. It was stored some time ago under different rules. I believe Polaris has to honour its persisted config more than transient validation settings, because Polaris' primary function as a catalog is to manifest its persisted data to clients in order for clients to gain access to tables. In this case the validation flags are not necessary to ensure correctness of operation. They are rather arbitrary limits from the perspective of accessing tables.
I can foresee how the new flag can cause a Polaris Server to fail to provide access to tables that used to be accessible before setting the flag. However, I do not see how this code change can help Polaris Server in dealing with configurations that have too many locations (for example).
If the concern is that having too many locations is detrimental to performance, then the user will have to adjust configuration somehow, but again in that case, what is the value of failing to make unadjusted storage config available for processing? It can cause hard failures compared to hypothetical performance problems... unless I'm missing something :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature flags on top of validation config/flags sounds like an overkill to me... but I will not object to that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the idea -- separate out actual features from things like this where we want to be able to "revert" easily.
Anyway, if you still think this check is problematic I don't totally disagree, and we can try to move this to the API layer so there's no issue with deserializing a config that was created with a higher limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that the issues I mentioned earlier are still present in the latest state of this PR (unless I'm missing something).
Advocating for the OSS user of Polaris, I believe it is preferable to move this validation to the API layer.
I will not block this PR because of that, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I moved the check into the builder in CatalogEntity, which I think is the best narrow waist for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest validation approach LGTM :) thx! Posting a couple of minor comments separately.
polaris-core/src/main/java/org/apache/polaris/core/entity/CatalogEntity.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/entity/CatalogEntity.java
Outdated
Show resolved
Hide resolved
| validateMaxAllowedLocations(MAX_ALLOWED_LOCATIONS); | ||
| validateMaxAllowedLocations( | ||
| PolarisConfigurationStore.getConfiguration( | ||
| PolarisConfiguration.STORAGE_CONFIGURATION_MAX_LOCATIONS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest validation approach LGTM :) thx! Posting a couple of minor comments separately.
polaris-core/src/main/java/org/apache/polaris/core/entity/CatalogEntity.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok merging without my minor comments if you disagree :)
…e#1068) * initial commit * autolint * change the config store access * autolint * add support for 01 * autolint * fix test * autolint * retest * rebase * autolint * change the config store access * autolint * add support for 01 * autolint * fix test * autolint * retest * fix a test * autolint * fix another test * autolint * remove catalog config for now as it's not used * changes * autolint * update test to reflect -1 default * autolint * autolint * move the check * changes per review * ready * autolint
…rage config" (apache#26) * b93e97b * Adjustable limit on the number of locations per storage config (apache#1068) * initial commit * autolint * change the config store access * autolint * add support for 01 * autolint * fix test * autolint * retest * rebase * autolint * change the config store access * autolint * add support for 01 * autolint * fix test * autolint * retest * fix a test * autolint * fix another test * autolint * remove catalog config for now as it's not used * changes * autolint * update test to reflect -1 default * autolint * autolint * move the check * changes per review * ready * autolint * spotless --------- Co-authored-by: Eric Maynard <[email protected]>
Currently, the number of locations allowed in a storage configuration is hard-coded per cloud provider. In fact, Polaris avoids building a policy that uses every location at once and so there isn't a need to limit this number. Only if a table spanned N locations would a request to (e.g.) STS actually include a policy that has N locations.
This PR introduces a config to adjust that limit, and also raises the default.