Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Sep 27, 2021

During index authorization, the IndicesAccessControl is built for all
original indices. The list of original indices can be large for a large
size cluster and requests with liberal wildcards. Therefore computing
IndicesAccessControl for every one of them can be expensive.

The observation is that many shard/index level request only really just
target a single concrete index and hence just need IndicesAccessControl
for that single index. One twist is that the single index may be
requested with its alias or authorized as part of a data stream. In this
case, the relevant aliases and data stream also need to be considered.
Still this list of concrete index and its aliases and data stream can be
much smaller than the original indices.

This PR adds an effective allowlist for requests that we know how to
extract the single concrete target index and only compute
IndicesAccessControl for the index and relevant aliases, data stream.

During index authorization, the IndicesAccessControl is built for all
original indices. The list of original indices can be large for a large
size cluster and requests with liberal wildcards. Therefore computing
IndicesAccessControl for every one of them can be expensive.

The observation is that many shard/index level request only really just
target a single concrete index and hence just need IndicesAccessControl
for that single index. One twist is that the single index may be
requested with its alias or authorized as part of a data stream. In this
case, the relevant aliases and data stream also need to be considered.
Still this list of concrete index and its aliases and data stream can be
much smaller than the original indices.

This PR adds an effective allowlist for requests that we know how to
extract the single concrete target index and only compute
IndicesAccessControl for the index and relevant aliases, data stream.
@ywangd ywangd added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.16.0 labels Sep 27, 2021
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Sep 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Comment on lines 660 to 670
// An allowlist of requests that have only a single concrete target index regardless of the original indices
private String getSingleTargetIndex(TransportRequest request) {
if (request instanceof ShardSearchRequest) {
return ((ShardSearchRequest) request).shardId().getIndexName();
} else if (request instanceof FieldCapabilitiesIndexRequest) {
return ((FieldCapabilitiesIndexRequest) request).index();
} else if (request instanceof ShardFetchSearchRequest) {
return ((ShardFetchSearchRequest) request).getShardSearchRequest().shardId().getIndexName();
}
return null;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more Transport Requests that target a single index/shard. These three are chosen to match what are currently being used in benchmarking. They also serve the purpose of demostrating the general idea of this change. I'll work on a more complete list once we agree on the approach.

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 28, 2021
martijnvg added a commit that referenced this pull request Sep 28, 2021
Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index

Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768

Relates to #78327
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 28, 2021
Backport elastic#78372 to 7.x

Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index

Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768

Relates to elastic#78327
elasticsearchmachine pushed a commit that referenced this pull request Sep 28, 2021
Backport #78372 to 7.x

Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index

Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768

Relates to #78327
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a very close look at this.
My hope is that the same improvement is going to be covered in core, by #78508 . I think that is almost there, and it also avoids the serialization/deserialization costs you were mentioning in the last weekly meeting.

@ywangd
Copy link
Member Author

ywangd commented Oct 6, 2021

I took a very close look at this. My hope is that the same improvement is going to be covered in core, by #78508 . I think that is almost there, and it also avoids the serialization/deserialization costs you were mentioning in the last weekly meeting.

Thanks Albert. Filtering the list of names in the core (#78508) is a superior solution in terms of performance. It covers this PR and more:

  1. It does what this PR is trying to do in a more natural way
  2. It reduces the looping cost (due to small set of requested indices) in IndicesAndAliasesResolver#resolveIndicesAndAliases
  3. More importantly, it cut downs the de/serialisation cost for sending requests cross nodes.

I am happy to close this PR once #78508 is merged. We should also discuss whether the same change should be actively promoted to other parts in core where applicable. I think it should be. But discussion would be great to understand the impact better and also help decide on whether core should always be aware of the source names, i.e. whether the concrete index name is from an alias or even multiple aliases.

PS: This PR and other hacks locally almost screamed for the change in #78508. But I just didn't notice it. A lesson learned on how to tackle an issue in a more broader context (than just security).

} else if (request instanceof ShardFetchSearchRequest) {
return ((ShardFetchSearchRequest) request).getShardSearchRequest().shardId().getIndexName();
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement something in core that unifies these? It seems like

public interface ShardIndicesRequest extends IndicesRequest {
    public ShardId shardId();
}

would be helpful.

// The code uses a simple heuristic to choose between the two: if the total number of aliases is less
// than the number of requested names, it uses Method 1. Otherwise, it uses Method 2.
int totalAliases = 0;
totalAliases += targetIndexAbstraction.getIndices().get(0).getAliases().size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bothers me.
Can we add getIndexMetadata to IndexAbstraction.Index so we don't have to rely on the implementation detail of pulling a index out of the singleton list?

Or add getNumberOfAliases to IndexAbstraction? (I assume we don't call getAliases().size() because it's not very efficient).

Or we could make IndexAbstraction.getAliases() return Collection<String> and make IndexAbstraction.Index.getAliases cheaper to call.

Actually, since we call getAliases below (in method 1) we should just call it here and keep a local copy of the result.

}
if (totalAliases < indices.size()) {
// Method 1
effectiveIndices = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
effectiveIndices = new HashSet<>();
effectiveIndices = new HashSet<>(totalAliases + 2);
// aliases + targetIndexName + parentDataStream

Comment on lines +678 to +687
effectiveIndices = indices.stream().filter(name -> {
if (name.equals(targetIndexName)) {
return true;
}
final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name);
if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) {
return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName));
}
return false;
}).collect(Collectors.toUnmodifiableSet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collectors.toUnmodifiableSet() is very inefficient. If you're trying to squeeze the milliseconds here, you're better off constructing the set by hand like in method 1.

Suggested change
effectiveIndices = indices.stream().filter(name -> {
if (name.equals(targetIndexName)) {
return true;
}
final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name);
if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) {
return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName));
}
return false;
}).collect(Collectors.toUnmodifiableSet());
List<String> indexNames = new ArrayList<>(indices.size());
indices.stream().filter(name -> {
if (name.equals(targetIndexName)) {
return true;
}
final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name);
if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) {
return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName));
}
return false;
}).forEach(effectiveIndices::add);
effectiveIndices = Set.copyOf(indexNames);

I've suggested using a list to collect the names here, because I think it's going to be more efficient (but I haven't benchmarked it).
Set.copyOf constructs a new HashSet even if the argument is already a HashSet, so you're better off just using an ArrayList for raw efficiency while collecting and then paying the uniqueness cost once during copyOf.

}

final IndicesAccessControl accessControl =
role.authorize(action, Set.copyOf(effectiveIndices), aliasAndIndexLookup, fieldPermissionsCache);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to introduce a copyOf here, then we should change the caller to use Set.of rather than Sets.newHashSet and reduce the amount of copying that happens.

@ywangd
Copy link
Member Author

ywangd commented Oct 20, 2021

Closing since changes in the core (#78508) is a better pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants