Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Dec 21, 2020

Phase 2 of operator privileges is to support operator-only settings. This includes:

  • Prevent operator-only setting from being updated with cluster settings API unless the user is an operator
  • Skip operator-only setting during restore unless the authenticating user is an operator.

For the restore process, even if the user is not an operator, the restore will not fail, but rather succeeds by skipping any operator-only settings. This means, if the current cluster state has values for an operator-only setting, it will have the same value after restore. When the user is an operator, the restore behaviour is essentially the same as of today.
(EDIT: after discussions, the restore behaviour for operator-only settings will be the same for either operator or non-operator user. When operator privileges are enabled, operator-only settings will not be restored. Otherwise (if the feature is disabled), the behaviour is the same as of today.

Operator and non-operator settings are differentiated with a new DynamicOperator property. The Setting constructor enforces that Dynamic and DynamicOperator cannot be both specified.

I spent quite some time exploring how the restore code should be made aware of operator privileges. The challenge is that they are located in different packages. There are a few options, but at the end I decided to let the SnapshotRestoreRequest carry a flag to indicate whether or not to filter operator-only settings. This flag is set in AuthorizationService where operator privileges are enforced. This is similar to how RequestInterceptor works except it is for a cluster action instead of index actions. I personally think it is a reasonable, lightweight approach that does the job to separate the concerns of security and snapshot/restore. We could choose a more formal and heavier approach by introducing a new Plugin method (or even interface) to allow the main to pull settings filters from plugins. But I think it's an overkill just for this purpose.

The PR does have javaRestTest to test overall features, but otherwise lacks unit tests. I'll be adding them once the overall approach is approved.

@ywangd
Copy link
Member Author

ywangd commented Dec 23, 2020

@tvernum As talked during the meeting, for this draft PR, I am seeking feedback mainly around the approach that makes the snapshot code aware of the information provided by operator privileges check. As said in the PR description, it is done by having the security code flag a new field on the SnapshotRestoreRequest object. This is one of the two major code level design decisions we need to make. The other is how we label operator-only and non-operator settings, for which I chose to use a new Setting Property as suggested by @henningandersen .

Of course besides the above main issue, any other comments are always welcome. Thanks!

@henningandersen
Copy link
Contributor

When the user is an operator, the restore behaviour is essentially the same as of today.

I am curious about whether this is ever needed? If the operator controls some settings, I would think we never want to restore those. For instance, if the operator restores an env from production to dev every weekend, I would think the production settings may not be valid for the dev environment (like rate-limiter or autoscaling policies)? Are there examples where this is desired behavior?

@ywangd
Copy link
Member Author

ywangd commented Dec 31, 2020

When the user is an operator, the restore behaviour is essentially the same as of today.

I am curious about whether this is ever needed? If the operator controls some settings, I would think we never want to restore those. For instance, if the operator restores an env from production to dev every weekend, I would think the production settings may not be valid for the dev environment (like rate-limiter or autoscaling policies)? Are there examples where this is desired behavior?

I think this behaviour is useful when moving cluster to different zones or provision (replicate) a cluster from existing ones. Also this behaviour is backward compatible when operator privileges are enabled on Cloud, i.e. there is no new logic (on Cloud side) needed for any downstream/subsequent actions after restore. With this being said, I'll forward this question to Cloud folks to get more definitive answers.

@tvernum
Copy link
Contributor

tvernum commented Dec 31, 2020

I'd start from the question of what should the behaviour be when operator privileges are not enabled (remembering that it's an Enterprise feature, so many on-prem customers will not have it).

I think it would be surprising (to the point of being incorrect) if restoring a cluster from a snapshot did not restore IP Filters (as an example). For example, you attempt a version upgrade, it goes wrong, so you wipe your cluster and restore from snapshot but your IP Filters aren't restored & your clusters is now open. That seems wrong.

I would argue that the same is true if I restore a cluster as an operator, when operator privileges are enabled, because

  1. The same reasoning applies (though not as strongly because the cluster admin has opted-in to operator privileges, so we can argue that they should be aware of these consequences)
  2. Consistency is better. If operator privileges are about stopping non-operators from doing operator-things, then what's the argument for stopping operators from doing things that are possible on a regular cluster?

@ywangd ywangd marked this pull request as ready for review February 1, 2021 06:03
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Feb 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)


public void maybeInterceptRequest(ThreadContext threadContext, TransportRequest request) {
if (request instanceof RestoreSnapshotRequest) {
((RestoreSnapshotRequest) request).skipOperatorOnly(shouldProcess());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code always sets the value of skipOperatorOnly because:

  1. Similar to other authorization decision, the value is computed and local to every node
  2. The value is always set properly so that a transport client cannot override this field (it is not exposed to Rest)

@Override
public void maybeInterceptRequest(ThreadContext threadContext, TransportRequest request) {
if (request instanceof RestoreSnapshotRequest) {
((RestoreSnapshotRequest) request).skipOperatorOnly(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this field is always set even when operator privileges are not enabled to retain the default behaviour. See also above.

@ywangd
Copy link
Member Author

ywangd commented Feb 1, 2021

@tvernum @henningandersen This PR is now ready for review.
@droberts195 Could you please confirm that the ML settings that have been labelled as "operator-only" correctly reflect what the ML team has in mind?

Thanks!

@droberts195
Copy link

@droberts195 Could you please confirm that the ML settings that have been labelled as "operator-only" correctly reflect what the ML team has in mind?

Yes, the 6 that are changed in this PR are good - thanks!

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

public void testOutcomeOfSuperuserPerformingOperatorOnlyActionWillDependOnWhetherFeatureIsEnabled() {
public void testOperatorOnlyActionOrSettingWillNotBeActionableByNormalSuperuser() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value do we get from combining these into 1 test case? It seems more natural to have 2 separate cases for action and settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real value. I split it and also the corresponding one that test the operator user.

/**
* Returns <code>true</code> if this setting is dynamically updateable by operators, otherwise <code>false</code>
*/
public final boolean isDynamicOperator() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this should just be isOperatorOnly()
I think it would read better in OperatorOnlyRegistry.checkClusterUpdateSettings (because that code is really focused on "is this an operator setting" rather than "is this dynamic, but only updateable by operators")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes senses since operator settings must be dynamic in the first place, though this assumes some understanding of the concept. But overall I agree it reads better in the call sites.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ywangd , I added a few minor things to address.

}
}

@SuppressWarnings("unchecked")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not spot the place where this is needed, perhaps it can be removed or alternatively moved to the specific line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It is indeed a copy/paste artifact ~~

}

@SuppressWarnings("unchecked")
public void testSnapshotRestoreBehaviourOfOperatorSettings() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want the opposite test too - checking that without operator privileges enabled, we do restore properties marked as operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these would be covered by existing tests already, but I was wrong. The settings are from xpack and most snapshot related tests are in core. I added a new yml test to cover this and PUT _cluster/settings.

assertTrue((boolean) operatorPrivileges.get("enabled"));
}

public void testUpdateOperatorSettings() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want a test to validate that we can set operator settings when operator privileges are disabled too. Could also be in the single node test rather than rest test, if easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a yml test for it. See also above.

private boolean includeAliases = true;
private Settings indexSettings = EMPTY_SETTINGS;
private String[] ignoreIndexSettings = Strings.EMPTY_ARRAY;
private boolean skipOperatorOnly = false; // this field does not get serialised because it is always set locally by authz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not included in toString, would be nice to add it for debugging purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toString method relies on toXContent, which I intentionally left out for this new field. To support toString, I extracted the internals of the toXContent method into a new toXContentFragment method and use it for the string building.

if (request.includeGlobalState()) {
if (metadata.persistentSettings() != null) {
Settings settings = metadata.persistentSettings();
clusterSettings.validateUpdate(settings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this never causes an issue, but I think this line should go below the modification of the settings below to ensure the updated settings are valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch, thanks! It may not cause real issue like you said, nevertheless it is better to be safe.

XContentParser parser = XContentType.JSON.xContent().createParser(
NamedXContentRegistry.EMPTY, null, BytesReference.bytes(builder).streamInput());
Map<String, Object> map = parser.mapOrdered();
assertFalse(map.containsKey("skip_operator_only"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it was nicer to verify that toXContent gives same result regardless of the skipOperatorOnly flag? I.e., invoke it on two instances with different skipOperatorOnly flag and verify the maps are identical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

…apshots/restore/RestoreSnapshotRequest.java

Co-authored-by: Tim Vernum <[email protected]>
@ywangd
Copy link
Member Author

ywangd commented Feb 4, 2021

@elasticmachine update branch

@ywangd
Copy link
Member Author

ywangd commented Feb 4, 2021

@droberts195 Could you please confirm that the ML settings that have been labelled as "operator-only" correctly reflect what the ML team has in mind?

Yes, the 6 that are changed in this PR are good - thanks!

Thanks @droberts195 But somehow I counted 9 of them:

  1. xpack.ml.node_concurrent_job_allocations
  2. xpack.ml.max_machine_memory_percent
  3. xpack.ml.use_auto_machine_memory_percent
  4. xpack.ml.max_lazy_ml_nodes
  5. xpack.ml.process_connect_timeout
  6. xpack.ml.nightly_maintenance_requests_per_second
  7. xpack.ml.max_ml_node_size
  8. xpack.ml.enable_config_migration
  9. xpack.ml.persist_results_max_retries

Could you please advise? Thanks!

@droberts195
Copy link

But somehow I counted 9 of them

Yes, sorry, you are right that there are 9. I'm happy with all of those.

@ywangd ywangd merged commit 8e49ba9 into elastic:master Feb 4, 2021
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Feb 4, 2021
)

Add a new OperatorDynamic enum to differentiate between operator-only and
regular dynamic cluster settings. The Setting constructor validates that Dynamic
and OperatorDynamic cannot be both specified. Operator-only settings behave
as the follows:

* When the feature is enabled, operator-only settings cannot be updated  with
   PUT cluster settings API unless the user is an operator.
* The restore behaviour for operator-only settings will be identical for either
  operator or non-operator user. That is, when operator privileges are enabled,
  operator-only settings will not be restored. Otherwise (if the feature is
  disabled), the behaviour is the same as of today.
ywangd added a commit that referenced this pull request Feb 5, 2021
…68563)

Add a new OperatorDynamic enum to differentiate between operator-only and
regular dynamic cluster settings. The Setting constructor validates that Dynamic
and OperatorDynamic cannot be both specified. Operator-only settings behave
as the follows:

* When the feature is enabled, operator-only settings cannot be updated  with
   PUT cluster settings API unless the user is an operator.
* The restore behaviour for operator-only settings will be identical for either
  operator or non-operator user. That is, when operator privileges are enabled,
  operator-only settings will not be restored. Otherwise (if the feature is
  disabled), the behaviour is the same as of today.
ywangd added a commit that referenced this pull request Mar 10, 2021
Add documentation for operator privilegs. The docs cover features delivered by phase 1 (#65256) and 2 (#66684).

Co-authored-by: Tim Vernum <[email protected]>
Co-authored-by: lcawl <[email protected]>
Co-authored-by: Adam Locke <[email protected]>
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Mar 10, 2021
Add documentation for operator privilegs. The docs cover features delivered by phase 1 (elastic#65256) and 2 (elastic#66684).

Co-authored-by: Tim Vernum <[email protected]>
Co-authored-by: lcawl <[email protected]>
Co-authored-by: Adam Locke <[email protected]>
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Mar 10, 2021
Add documentation for operator privilegs. The docs cover features delivered by phase 1 (elastic#65256) and 2 (elastic#66684).

Co-authored-by: Tim Vernum <[email protected]>
Co-authored-by: lcawl <[email protected]>
Co-authored-by: Adam Locke <[email protected]>
ywangd added a commit that referenced this pull request Mar 10, 2021
Add documentation for operator privilegs. The docs cover features delivered by phase 1 (#65256) and 2 (#66684).

Co-authored-by: Tim Vernum <[email protected]>
Co-authored-by: lcawl <[email protected]>
Co-authored-by: Adam Locke <[email protected]>
ywangd added a commit that referenced this pull request Mar 10, 2021
Add documentation for operator privilegs. The docs cover features delivered by phase 1 (#65256) and 2 (#66684).

Co-authored-by: Tim Vernum <[email protected]>
Co-authored-by: lcawl <[email protected]>
Co-authored-by: Adam Locke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants