Skip to content

Conversation

@eric-maynard
Copy link
Contributor

@eric-maynard eric-maynard commented Feb 25, 2025

Currently, the number of locations allowed in a storage configuration is hard-coded per cloud provider. In fact, Polaris avoids building a policy that uses every location at once and so there isn't a need to limit this number. Only if a table spanned N locations would a request to (e.g.) STS actually include a policy that has N locations.

This PR introduces a config to adjust that limit, and also raises the default.

PolarisConfiguration.<Integer>builder()
.key("STORAGE_CONFIGURATION_MAX_LOCATIONS")
.description("How many locations can be associated with a storage configuration")
.defaultValue(20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a limit in the catalog level if a table can only be possible with three locations specified by properties location, write.data.path, and write.metadata.path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't use this value for those locations; this is a limit on the allowed-locations a catalog can have

Copy link
Contributor

@flyrain flyrain Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't express it clearly originally. My question is about why we need a limit on the allowed-locations of a catalog? Per our off-line discussion, IIUC, it's needed for table level policy used by credential vending so that the policy text won't be too long to exceed certain limit. Given that table locations are limited even there could be a large number of allowed-locations of its catalog, it seems not an issue, and no reason to put a limit. Am I understanding correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I would be okay with removing a limit totally, but on the other hand @collado-mike is mentioning the possibility of having 3 limits. I think preserving at least one limit, so we have the concept of a limit, may be helpful in the future especially if we do push storage configs down to the table level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with storage configs pushing down to table level, the chance of a unbounded number of table location is quite small.

  1. For majority Iceberg use cases within Polaris, writers will only possible to use three locations specified by properties location, write.data.path, and write.metadata.path.
  2. For migration use case, admittedly it is possible that are more than 3 locations mentioned above. However, users should be aware of the number of locations while migration, and add them to the table-level storage configs. At that time, we can enforce it by saying "that's too many locations, credential vending won't work." The limit seems better at table-level, as locations from different tables may not overlap.

In short, a limit at catalog level doesn't seem necessary now, and may not be effective in the future. I'd consider to remove it. But I'm open to be convinced by other use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go with no limit. I think the current users should be fine as we relax the limit by doing so. Nothing should be broken. If users start to ask for a limit due to whatever reason, then we can think of adding a limit config at that time. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to simply have no config for now 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of best practice, it is very nearly always a good idea to make behavior changes behind a config flag. It's very easy to add a config that allows replicating the exact behavior we see today and then to remove that config if users are happy with the proposed change. What's not easy to do is to bring that behavior back to existing deployments once the code is ripped out. Making small, incremental changes to ensure we understand the unintended side effects seems pretty uncontroversial to me. I think Yufei's suggestion to keep the config but mark it deprecated is reasonable and allows us the opportunity to make changes carefully. Why is that controversial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so is everyone okay with 1 config now? If so, I will go ahead with restoring the PR to its state before this commit.

I'm not sure what exactly it means to mark a config as "deprecated", but we cannot easily remove a config once we've added one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to 'disagree and commit' to support one config. but I think we ought to at least have one config that defaults to the current behavior and allow users to increase as they see fit.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, as a followup, we can provide an error message while the policy/rule exceeds the length, we may need to change the method getSubscopedCreds() a bit.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 4, 2025
@flyrain
Copy link
Contributor

flyrain commented Mar 4, 2025

Looks like this unit test failed, testExceedMaxAllowedLocations. We need to change that.

@eric-maynard eric-maynard changed the title Adjustable limit on the number of locations per storage config Remove the limit on the number of locations per storage config Mar 4, 2025
@pavibhai
Copy link
Contributor

pavibhai commented Mar 5, 2025

@eric-maynard Thanks for the changes.

I have a question with respect to skipping the configuration of allowed locations and prefix validation. Is this in scope for this PR and if so does this not require any change in the prefix validation logic?

If I am not mistaken we need handling in InMemoryStorageIntegration:validateSubpathsOfAllowedLocations

@eric-maynard eric-maynard disabled auto-merge March 5, 2025 18:57
@eric-maynard
Copy link
Contributor Author

Holding this for a moment to potentially rebase onto #1124

@collado-mike
Copy link
Contributor

@eric-maynard are you ok to merge this one now, then rebase on #1124 after we have approval from the other reviewers?

@eric-maynard
Copy link
Contributor Author

Unfortunately a rebase onto #1124 would change the flag from a feature flag to a behavior flag

validateMaxAllowedLocations(MAX_ALLOWED_LOCATIONS);
validateMaxAllowedLocations(
PolarisConfigurationStore.getConfiguration(
PolarisConfiguration.STORAGE_CONFIGURATION_MAX_LOCATIONS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If validation fails, it will cause a runtime error, making this storage configuration unusable... This seems risky to me because storage config is persisted, but the PolarisConfigurationStore.getConfiguration() value can change from run to run (it may even be different is different servers in the same cluster). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this concern was raised in the discussion above where @flyrain advocated for not having a config at all, but @collado-mike felt strongly that we needed one. In the case that the flag is set to e.g. 100, someone creates a storage config with 99 paths, and then the flag is lowered to 10 then yes runtime exceptions will start happening on the service as soon as the storage config is created in-memory.

This fragility is one reason why I'm trying (through #1124) to make this a behavior-change flag rather than a fully-supported feature flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this kind of enforcement is valuable on storage config creation (and changes), but not on load (regardless of how we present the config flag itself).

Copy link
Contributor Author

@eric-maynard eric-maynard Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the behavior here with a limit set is far from ideal.

I think this will be more clear once we close the discussion on #1124, but in essence this code should never be hit unless the user has touched a flag that's described there as:

... These flags control subtle behavior adjustments and bug fixes, not user-facing catalog settings. They are intended or internal use only, are inherently unstable, ...

Going forward with these flags, I think we should consider that sometimes the behavior you get with the flag set to a non-default value is a poor or even broken experience. The only reason the problematic code (this entire check, in this case) is left in place is due to some perceived risk or potential regression associated with removing it entirely. Eventually, we will want to remove it entirely. #1124 says of this:

Flags here are generally short-lived and should either be removed or promoted to stable feature flags before the next release.

In accordance with this guidance, this entire check should go away before the next release at which point in time I think the concern you raise above would be addressed. In the interim, we will be able to optionally switch back to the exact semantics in place before this change (the intent of a behavior change flag). WDYT?

Copy link
Contributor

@dimas-b dimas-b Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, my concern is not about how configuration works or how configuration is described. My concern is with runtime behaviour in case the new config settings make storage config validation fail.

Whether the admin user had enough warning about risks is irrelevant, IMHO, because the storage config in question is present in the database. It was stored some time ago under different rules. I believe Polaris has to honour its persisted config more than transient validation settings, because Polaris' primary function as a catalog is to manifest its persisted data to clients in order for clients to gain access to tables. In this case the validation flags are not necessary to ensure correctness of operation. They are rather arbitrary limits from the perspective of accessing tables.

I can foresee how the new flag can cause a Polaris Server to fail to provide access to tables that used to be accessible before setting the flag. However, I do not see how this code change can help Polaris Server in dealing with configurations that have too many locations (for example).

If the concern is that having too many locations is detrimental to performance, then the user will have to adjust configuration somehow, but again in that case, what is the value of failing to make unadjusted storage config available for processing? It can cause hard failures compared to hypothetical performance problems... unless I'm missing something :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature flags on top of validation config/flags sounds like an overkill to me... but I will not object to that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is the idea -- separate out actual features from things like this where we want to be able to "revert" easily.

Anyway, if you still think this check is problematic I don't totally disagree, and we can try to move this to the API layer so there's no issue with deserializing a config that was created with a higher limit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that the issues I mentioned earlier are still present in the latest state of this PR (unless I'm missing something).

Advocating for the OSS user of Polaris, I believe it is preferable to move this validation to the API layer.

I will not block this PR because of that, though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I moved the check into the builder in CatalogEntity, which I think is the best narrow waist for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest validation approach LGTM :) thx! Posting a couple of minor comments separately.

validateMaxAllowedLocations(MAX_ALLOWED_LOCATIONS);
validateMaxAllowedLocations(
PolarisConfigurationStore.getConfiguration(
PolarisConfiguration.STORAGE_CONFIGURATION_MAX_LOCATIONS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest validation approach LGTM :) thx! Posting a couple of minor comments separately.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok merging without my minor comments if you disagree :)

@eric-maynard eric-maynard enabled auto-merge (squash) March 13, 2025 17:44
@eric-maynard eric-maynard merged commit 21efdf6 into apache:main Mar 13, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Mar 13, 2025
rakesh-das08 pushed a commit to rakesh-das08/polaris that referenced this pull request Mar 22, 2025
…e#1068)

* initial commit

* autolint

* change the config store access

* autolint

* add support for 01

* autolint

* fix test

* autolint

* retest

* rebase

* autolint

* change the config store access

* autolint

* add support for 01

* autolint

* fix test

* autolint

* retest

* fix a test

* autolint

* fix another test

* autolint

* remove catalog config for now as it's not used

* changes

* autolint

* update test to reflect -1 default

* autolint

* autolint

* move the check

* changes per review

* ready

* autolint
travis-bowen pushed a commit to travis-bowen/polaris that referenced this pull request Jun 20, 2025
…rage config" (apache#26)

* b93e97b

* Adjustable limit on the number of locations per storage config (apache#1068)

* initial commit

* autolint

* change the config store access

* autolint

* add support for 01

* autolint

* fix test

* autolint

* retest

* rebase

* autolint

* change the config store access

* autolint

* add support for 01

* autolint

* fix test

* autolint

* retest

* fix a test

* autolint

* fix another test

* autolint

* remove catalog config for now as it's not used

* changes

* autolint

* update test to reflect -1 default

* autolint

* autolint

* move the check

* changes per review

* ready

* autolint

* spotless

---------

Co-authored-by: Eric Maynard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants