-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ListenableFuture for authorize index action #78358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Loading authorizedIndices can become more expensive with the increase of index names (generally in proportion to the cluster size). Computing indicesAccessControl can also become more expensive with the increase of requested indices, e.g. a wildcard search targeting all indices in the cluster. Worse, the computations are repeated for every request even when they are identical. This means high concurrency of user requests, e.g. searches, can potentially spend a lot of time in authorization which in turn blocks transport worker and leads to cluster instability. This PR changes both calculations to use ListenableFuture so that the expensive work is only done once for requests (REST or Transport) that are (1) identical in terms of index authorization and (2) trying to go through authorization process concurrently, e.g. Field Caps requests sent by Kibana. When N such requests are trying to perform authorization, only the first one will actually do the work and other requests will simply register a listener to the result. Once the first request has completed its work, the result is shared among the waiting requests. The first request is also responsible for clearing out the listenableFuture right after it completion. The advantage is that we do not need extra cache managament. The tradeoff is that the effective TTL is really short (essentially the elapsed time of the associated method) and is less effective for maximizing caching effects. Overall, the current approach is more preferrable for its simplicity and benefits towards high concurrent load.
|
Pinging @elastic/es-security (Team:Security) |
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/RBACEngine.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@elasticmachine update branch |
| final IndicesAccessControlCacheKey cacheKey = new IndicesAccessControlCacheKey(metadata.version(), | ||
| action, requestedIndices, | ||
| indicesPermissionTuple.v1(), | ||
| indicesPermissionTuple.v2()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like a lot of code to compute a cache key; it's hard to follow and counter-productive to the performance goal of this all.
I think we should work with the fact that the Role, resolvedIndices.getLocal, action and metadata.getIndicesLookup() are immutable (or effectively immutable, maybe assert or test it).
In this case, the cache key class's equals method would use the == test for the corresponding members; analogous with System#identityHashCode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow. Maybe I am missing something, but how is it possible to use == to compare action or resolvedIndices.getLocal? For example, even if two action strings have the same value, they are different objects belong to two different HTTP worker threads?
Comparing identity of metadata.getIndicesLookup() may be viable, but it does not seem to be more convenient or more reliable than comparing a simple version number?
As for Role, the issue is that the roles for API keys are constructed on the fly for each request. So even the same API key will get two equavilent Role objects but with different identities. To get around it, the cacheKey here uses only the IndicesPermission component of a Role and the comparison for IndicesPermission is an object identity.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/RBACEngine.java
Outdated
Show resolved
Hide resolved
albertzaharovits
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good and promising, and we can maybe later investigate making the cache entries more long lived.
From an implementation point of view, I think we can do better to make the cache key computation simpler, and maybe more isolated from the core logic.
From a performance point of view, I think we could set the concurrencyLevel parameter on the map to the number of transport worker threads, since we are positive every such thread will now do a computeIfAbsent on this map (it's more coupling than I usually like, but I think we have to find a heuristic for it - I don't have good suggestion here - but neither great expectations).
I'll take another look after you manage to simplify the cache key computation a bit.
With #78327 replaced by the changes in core (#78508), together with #78321, we are in a very good position for Index/Shard level requests, I feel fairly confident to say that we don't need any more optimisation for them in 7.16. Therefore, I'd like to rework this PR so it targets more specifically to REST level requests, or more technically, requests that are
This is a good point. I'll incorporate it in the rework. |
Other improvements: 1. A single cacheKey for the a single index authorization process 2. Resolving indices is now cached together with load authorized indices 3. Because of above, it now avoid using resolved indices are a member of the cache key but instead use the original index expression which should be often much smaller.
|
I have reworked this PR based on the recent development and feedback. It is now ready for another around. A few key points:
|
| ((IndicesRequest.Replaceable) requestInfo.getRequest()).indices( | ||
| value.resolvedIndices.isNoIndicesPlaceholder() ? NO_INDICES_OR_ALIASES_ARRAY : value.resolvedIndices.toArray() | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one place where we have a coupling with the code logic in IndicesAndAliasesResolver.resolveIndicesAndAliases. It's not ideal but I don't see a way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is necessary because invoking indicesAsyncSupplier has the side effect of replacing the original index expression with the resolved one.
In this case, to be completely honest, I think this is an indication we need to find a different approach to the caching issue.
Maybe we first need to separate the index expression evaluation from the expression replace. Then maybe we can independently cache the index expression evaluation. I don't want to venture and suggest alternatives. Bottom line, is that I do not think this PR, as it stands right now, is the good trade-off between maintainability and performance.
Concretely, I don't think this works for AliasesRequests, but even if you plug this hole this is extremely brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works for AliasesRequests
You are 100% correct. I meant to exclude it in isRequestCacheableForIndexAuthorization but totally forgot it in the refactoring process.
| static class IndexAuthorizationCacheKey { | ||
| private final String action; | ||
| private final String[] requestedIndices; | ||
| private final IndicesOptions indicesOptions; | ||
| private final long metadataVersion; | ||
| // A pair of indicesPermission and limitedByIndicesPermission. The latter is nullable | ||
| private final Tuple<IndicesPermission, IndicesPermission> indicesPermissions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new CacheKey class. The improments are:
- There is only one CacheKey since we now cache the whole index authorization process instead individual components.
- It uses
requestedIndiceswhich is often times much smaller compared toresolvedIndices.
@albertzaharovits These set of variables seem to be the minimal information required for the cacheKey (please also see my previous reply). Please let me know if you think otherwise.
| if (((IndicesRequest) requestInfo.getRequest()).allowsRemoteIndices()) { | ||
| // check action name | ||
| indexAuthorizationResult = authorizeIndexActionName( | ||
| requestInfo.getAction(), authorizationInfo, IndicesAccessControl.ALLOW_NO_INDICES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is needed.
The original code, without the caching, invoked the authorizeIndexActionName before the resolvedIndices are resolved, which I see you've maintained, but then still added another such check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request allows remote indices, the checking is more complex (which is yet another issue with the CCS security model).
If there are remote indices then we can't rely on whether the action is authorized locally, because it's allowable to perform a cross cluster search even if you don't have access to search any indices in the local cluster.
I'm not sure whether the check is right (it seems to only perform the action name check if there a no indices, but I would think that we want to perform it if there are no remote indices), but it's different for a reason.
I think the logic behind doing it if there are no indices, is that this is the only place where we can trigger an exception (because index checking won't happen).
But if there are any indices, we'll also check whether each of those indices is allowed, and fail if they're not (locally for local indices, or on the remote cluster for remote indices).
For non-remoteable actions, the upfront action name check is a performance optimisation - no need to check indices for an action you can't perform. For remoteable actions, we have a backstop for when there are no indices to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is needed.
The original code, without the caching, invoked the authorizeIndexActionName before the resolvedIndices are resolved, which I see you've maintained, but then still added another such check here.
It is needed. Because the caching is for both when the request allowsRemoteIndices and when it does not. The later one invokes authorizeIndexActionName, but the former one does not. This code covers both of them and moves the allowsRemoteIndices check inside this method.
If the request allows remote indices, the checking is more complex ...
This piece of code is supposed to be a pure refactoring (other than the caching part). If you look past the caching stuff, the logic is exactly the same as the existing code (unless I didn't do a good job in refactoring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look past the caching stuff, the logic is exactly the same as the existing code
My comment was in response to Albert's concerns, and was intended to explain why the code (existing and refactored) needs to have special cases for these situations.
| indicesRequest.indices(), | ||
| indicesRequest.indicesOptions(), | ||
| metadata.version(), | ||
| getRoleIndicesPermissions(ensureRBAC(authorizationInfo).getRole()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the permissions to know if we can cache the request's authorization result?
Can't we use the role or limited role instances with an equality check, since those instances are cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LimitedRole isn't cached.
We cache the limited-by, but not the constructed role.
Perhaps we should.
albertzaharovits
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I acknowledge the amount of effort here, but I worry about the added complexity in deciding when to cache and what to do with a cached value.
The complexity will make it even more nightmarish to maintain, and I don't think this overall is a great performance boost, because we now introduce atomics and forking on thread pools which, for easy-to-authorize requests it actually amounts to a performance penalty.
tvernum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is similar to @albertzaharovits's here. This introduces a fair bit more complexity into code that is already far too complex to safely deal with.
What sort of performance gain do we expect here? It's hard to evaluate the tradeoff without some real data.
| this.indexAuthorizationCache = | ||
| CacheBuilder.<IndexAuthorizationCacheKey, ListenableFuture<IndexAuthorizationCacheValue>>builder() | ||
| .setMaximumWeight(cacheSize) | ||
| .setExpireAfterWrite(TimeValue.timeValueMinutes(5)) // a fallback just in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't really mean anything in context.
It's based on the undocumented (at this point) condition that things are automatically cleared from the cache when the listener completes, but that isn't clear in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have improved the comment which should make more sense now.
| Map<String, IndexAbstraction> aliasOrIndexLookup, | ||
| ActionListener<IndexAuthorizationResult> listener | ||
| ) { | ||
| assert false : "this method should no longer be used"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this method exist if it's not used? Just delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the existing method from the AuthorizationEngine interface. It cannot be deleted because the interface mandates it and there is no good default the interface can provide. This is a way to retire this method without breaking custom authz engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about breaking custom engines, just delete what needs to be deleted. It's OK to change the interface for these sorts of plugins.
| if (((IndicesRequest) requestInfo.getRequest()).allowsRemoteIndices()) { | ||
| // check action name | ||
| indexAuthorizationResult = authorizeIndexActionName( | ||
| requestInfo.getAction(), authorizationInfo, IndicesAccessControl.ALLOW_NO_INDICES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request allows remote indices, the checking is more complex (which is yet another issue with the CCS security model).
If there are remote indices then we can't rely on whether the action is authorized locally, because it's allowable to perform a cross cluster search even if you don't have access to search any indices in the local cluster.
I'm not sure whether the check is right (it seems to only perform the action name check if there a no indices, but I would think that we want to perform it if there are no remote indices), but it's different for a reason.
I think the logic behind doing it if there are no indices, is that this is the only place where we can trigger an exception (because index checking won't happen).
But if there are any indices, we'll also check whether each of those indices is allowed, and fail if they're not (locally for local indices, or on the remote cluster for remote indices).
For non-remoteable actions, the upfront action name check is a performance optimisation - no need to check indices for an action you can't perform. For remoteable actions, we have a backstop for when there are no indices to check.
| private Tuple<IndicesPermission, IndicesPermission> getRoleIndicesPermissions(Role role) { | ||
| final IndicesPermission indicesPermission; | ||
| final IndicesPermission limitedByIndicesPermission; | ||
| if (role instanceof LimitedRole) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think RBAC engine should have code that specifically handles LimitedRole. We need to find a way to encapsulate this properly.
| indicesRequest.indices(), | ||
| indicesRequest.indicesOptions(), | ||
| metadata.version(), | ||
| getRoleIndicesPermissions(ensureRBAC(authorizationInfo).getRole()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LimitedRole isn't cached.
We cache the limited-by, but not the constructed role.
Perhaps we should.
| */ | ||
| private boolean isRequestCacheableForIndexAuthorization(TransportRequest request) { | ||
| return request instanceof IndicesRequest.Replaceable | ||
| && (false == request instanceof PutMappingRequest || ((PutMappingRequest) request).getConcreteIndex() == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expand this into a series of ifs? It's almost impossible to follow nested && and || conditions like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I amended it as you suggested.
|
Thanks for taking time looking into this PR. I recongise it is a complex and likely controvesial change. Realistically, I don't think it is something that we can accomodate for 7.16 (if at all). To me, the most glaring issue with this PR is that we don't have associated benchmarks to either support or disapprove it. This plus the level of complexity makes it difficult to justify and I acknowledge that. Nevertheless, I think it is still useful to serve as a strawman proposal and hopefully better ideas would emerge through discussions. So please let me spend some time to reply your major comments in this single post for (perhaps) easier digestion. The change has its root from my dormant PoC for caching authorizedIndices and was then reinvigorated after a discussion with Armin about potential
This PR is hence an implementation for the above idea. Now onto the comments ... Comments from Albert
First I admit there is no benchmarking to prove any performance gain. But I think performance penalty is unlikely, the caching mechansim is only triggered for a very specific situation, (1) multiple requests coming in the same time; (2) with identical authZ conditions; (3) and authZ needs to be expensive enough so that one request starts authZ before another can finish authZ. If the request is easy-to-authorize, it's likely that the caching won't even trigger. Again, without actual benchmarking, all these are just my best guess.
This is the fundamental issue: we are trying to cache a non-pure function and that's why the code logic is leaking out. This is why the previous version had two separate caches, one for
It's a tough one to crack: Side effect and caching just don't play well and multiple caches is unlikely a good idea. Comments from Tim
I think I answered this question in previous section and I acknowledge that there is no data to support it. Overall, I think the idea still has its value. But two things need to be addressed:
I am interested in spending sometime on the above items in the 8.x timeframe. Also maybe along the way, we can discover better ideas (but at least item 1 will not be a waste). |
But the cache management code (like the AtomicBoolean to check whether an entry exists in the cache) does still trigger. For a hypothetical node where the caching is never useful (because it literally never has concurrent requests over the same indices) this code will be slower than the old node because the cache exists & needs to be managed, but adds no value.
This is where I think tradeoff questions come into play. If we can find a way to cache |
|
Closing this as we agreed that it is not an overall good tradeoff. |
Loading authorizedIndices can become more expensive with the increase of
index names (generally in proportion to the cluster size). Computing
indicesAccessControl can also become more expensive with the increase of
requested indices, e.g. a wildcard search targeting all indices in the
cluster. Worse, the computations are repeated for every request even
when they are identical. This means high concurrency of user requests,
e.g. searches, can potentially spend a lot of time in authorization
which in turn blocks transport worker and leads to cluster instability.
This PR changes both calculations to use ListenableFuture so that the
expensive work is only done once for requests (REST or Transport) that
are (1) identical in terms of index authorization and (2) trying to go
through authorization process concurrently, e.g. Field Caps requests
sent by Kibana. When N such requests are trying to perform
authorization, only the first one will actually do the work and other
requests will simply register a listener to the result. Once the first
request has completed its work, the result is shared among the waiting
requests.
The first request is also responsible for clearing out the
listenableFuture right after it completion. The advantage is that we do
not need extra cache managament. The tradeoff is that the effective TTL
is really short (essentially the elapsed time of the associated method)
and is less effective for maximizing caching effects. Overall, the
current approach is more preferrable for its simplicity and benefits
towards high concurrent load.