-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Build indicesAccessControl for selective indices #78327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
During index authorization, the IndicesAccessControl is built for all original indices. The list of original indices can be large for a large size cluster and requests with liberal wildcards. Therefore computing IndicesAccessControl for every one of them can be expensive. The observation is that many shard/index level request only really just target a single concrete index and hence just need IndicesAccessControl for that single index. One twist is that the single index may be requested with its alias or authorized as part of a data stream. In this case, the relevant aliases and data stream also need to be considered. Still this list of concrete index and its aliases and data stream can be much smaller than the original indices. This PR adds an effective allowlist for requests that we know how to extract the single concrete target index and only compute IndicesAccessControl for the index and relevant aliases, data stream.
|
Pinging @elastic/es-security (Team:Security) |
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/RBACEngine.java
Outdated
Show resolved
Hide resolved
| // An allowlist of requests that have only a single concrete target index regardless of the original indices | ||
| private String getSingleTargetIndex(TransportRequest request) { | ||
| if (request instanceof ShardSearchRequest) { | ||
| return ((ShardSearchRequest) request).shardId().getIndexName(); | ||
| } else if (request instanceof FieldCapabilitiesIndexRequest) { | ||
| return ((FieldCapabilitiesIndexRequest) request).index(); | ||
| } else if (request instanceof ShardFetchSearchRequest) { | ||
| return ((ShardFetchSearchRequest) request).getShardSearchRequest().shardId().getIndexName(); | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more Transport Requests that target a single index/shard. These three are chosen to match what are currently being used in benchmarking. They also serve the purpose of demostrating the general idea of this change. I'll work on a more complete list once we agree on the approach.
IndexAbstraction.Index Relates to elastic#78327
Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768 Relates to #78327
Backport elastic#78372 to 7.x Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768 Relates to elastic#78327
Backport #78372 to 7.x Resolve aliases from IndexAbstraction.DataStream and IndexAbstraction.Index Helps with this specifically: https://github.com/elastic/elasticsearch/pull/78327/files#r717141768 Relates to #78327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a very close look at this.
My hope is that the same improvement is going to be covered in core, by #78508 . I think that is almost there, and it also avoids the serialization/deserialization costs you were mentioning in the last weekly meeting.
Thanks Albert. Filtering the list of names in the core (#78508) is a superior solution in terms of performance. It covers this PR and more:
I am happy to close this PR once #78508 is merged. We should also discuss whether the same change should be actively promoted to other parts in core where applicable. I think it should be. But discussion would be great to understand the impact better and also help decide on whether core should always be aware of the source names, i.e. whether the concrete index name is from an alias or even multiple aliases. PS: This PR and other hacks locally almost screamed for the change in #78508. But I just didn't notice it. A lesson learned on how to tackle an issue in a more broader context (than just security). |
| } else if (request instanceof ShardFetchSearchRequest) { | ||
| return ((ShardFetchSearchRequest) request).getShardSearchRequest().shardId().getIndexName(); | ||
| } | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement something in core that unifies these? It seems like
public interface ShardIndicesRequest extends IndicesRequest {
public ShardId shardId();
}
would be helpful.
| // The code uses a simple heuristic to choose between the two: if the total number of aliases is less | ||
| // than the number of requested names, it uses Method 1. Otherwise, it uses Method 2. | ||
| int totalAliases = 0; | ||
| totalAliases += targetIndexAbstraction.getIndices().get(0).getAliases().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bothers me.
Can we add getIndexMetadata to IndexAbstraction.Index so we don't have to rely on the implementation detail of pulling a index out of the singleton list?
Or add getNumberOfAliases to IndexAbstraction? (I assume we don't call getAliases().size() because it's not very efficient).
Or we could make IndexAbstraction.getAliases() return Collection<String> and make IndexAbstraction.Index.getAliases cheaper to call.
Actually, since we call getAliases below (in method 1) we should just call it here and keep a local copy of the result.
| } | ||
| if (totalAliases < indices.size()) { | ||
| // Method 1 | ||
| effectiveIndices = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| effectiveIndices = new HashSet<>(); | |
| effectiveIndices = new HashSet<>(totalAliases + 2); | |
| // aliases + targetIndexName + parentDataStream |
| effectiveIndices = indices.stream().filter(name -> { | ||
| if (name.equals(targetIndexName)) { | ||
| return true; | ||
| } | ||
| final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name); | ||
| if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) { | ||
| return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName)); | ||
| } | ||
| return false; | ||
| }).collect(Collectors.toUnmodifiableSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collectors.toUnmodifiableSet() is very inefficient. If you're trying to squeeze the milliseconds here, you're better off constructing the set by hand like in method 1.
| effectiveIndices = indices.stream().filter(name -> { | |
| if (name.equals(targetIndexName)) { | |
| return true; | |
| } | |
| final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name); | |
| if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) { | |
| return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName)); | |
| } | |
| return false; | |
| }).collect(Collectors.toUnmodifiableSet()); | |
| List<String> indexNames = new ArrayList<>(indices.size()); | |
| indices.stream().filter(name -> { | |
| if (name.equals(targetIndexName)) { | |
| return true; | |
| } | |
| final IndexAbstraction indexAbstraction = aliasAndIndexLookup.get(name); | |
| if (indexAbstraction.getType() != IndexAbstraction.Type.CONCRETE_INDEX) { | |
| return indexAbstraction.getIndices().stream().anyMatch(im -> im.getIndex().getName().equals(targetIndexName)); | |
| } | |
| return false; | |
| }).forEach(effectiveIndices::add); | |
| effectiveIndices = Set.copyOf(indexNames); |
I've suggested using a list to collect the names here, because I think it's going to be more efficient (but I haven't benchmarked it).
Set.copyOf constructs a new HashSet even if the argument is already a HashSet, so you're better off just using an ArrayList for raw efficiency while collecting and then paying the uniqueness cost once during copyOf.
| } | ||
|
|
||
| final IndicesAccessControl accessControl = | ||
| role.authorize(action, Set.copyOf(effectiveIndices), aliasAndIndexLookup, fieldPermissionsCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to introduce a copyOf here, then we should change the caller to use Set.of rather than Sets.newHashSet and reduce the amount of copying that happens.
|
Closing since changes in the core (#78508) is a better pattern. |
During index authorization, the IndicesAccessControl is built for all
original indices. The list of original indices can be large for a large
size cluster and requests with liberal wildcards. Therefore computing
IndicesAccessControl for every one of them can be expensive.
The observation is that many shard/index level request only really just
target a single concrete index and hence just need IndicesAccessControl
for that single index. One twist is that the single index may be
requested with its alias or authorized as part of a data stream. In this
case, the relevant aliases and data stream also need to be considered.
Still this list of concrete index and its aliases and data stream can be
much smaller than the original indices.
This PR adds an effective allowlist for requests that we know how to
extract the single concrete target index and only compute
IndicesAccessControl for the index and relevant aliases, data stream.