Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Sep 28, 2021

Loading authorizedIndices can become more expensive with the increase of
index names (generally in proportion to the cluster size). Computing
indicesAccessControl can also become more expensive with the increase of
requested indices, e.g. a wildcard search targeting all indices in the
cluster. Worse, the computations are repeated for every request even
when they are identical. This means high concurrency of user requests,
e.g. searches, can potentially spend a lot of time in authorization
which in turn blocks transport worker and leads to cluster instability.

This PR changes both calculations to use ListenableFuture so that the
expensive work is only done once for requests (REST or Transport) that
are (1) identical in terms of index authorization and (2) trying to go
through authorization process concurrently, e.g. Field Caps requests
sent by Kibana. When N such requests are trying to perform
authorization, only the first one will actually do the work and other
requests will simply register a listener to the result. Once the first
request has completed its work, the result is shared among the waiting
requests.

The first request is also responsible for clearing out the
listenableFuture right after it completion. The advantage is that we do
not need extra cache managament. The tradeoff is that the effective TTL
is really short (essentially the elapsed time of the associated method)
and is less effective for maximizing caching effects. Overall, the
current approach is more preferrable for its simplicity and benefits
towards high concurrent load.

Loading authorizedIndices can become more expensive with the increase of
index names (generally in proportion to the cluster size). Computing
indicesAccessControl can also become more expensive with the increase of
requested indices, e.g. a wildcard search targeting all indices in the
cluster. Worse, the computations are repeated for every request even
when they are identical. This means high concurrency of user requests,
e.g. searches, can potentially spend a lot of time in authorization
which in turn blocks transport worker and leads to cluster instability.

This PR changes both calculations to use ListenableFuture so that the
expensive work is only done once for requests (REST or Transport) that
are (1) identical in terms of index authorization and (2) trying to go
through authorization process concurrently, e.g. Field Caps requests
sent by Kibana. When N such requests are trying to perform
authorization, only the first one will actually do the work and other
requests will simply register a listener to the result. Once the first
request has completed its work, the result is shared among the waiting
requests.

The first request is also responsible for clearing out the
listenableFuture right after it completion. The advantage is that we do
not need extra cache managament. The tradeoff is that the effective TTL
is really short (essentially the elapsed time of the associated method)
and is less effective for maximizing caching effects. Overall, the
current approach is more preferrable for its simplicity and benefits
towards high concurrent load.
@ywangd ywangd added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.16.0 labels Sep 28, 2021
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Sep 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ywangd

This comment has been minimized.

@ywangd

This comment has been minimized.

@ywangd
Copy link
Member Author

ywangd commented Oct 1, 2021

@elasticmachine update branch

final IndicesAccessControlCacheKey cacheKey = new IndicesAccessControlCacheKey(metadata.version(),
action, requestedIndices,
indicesPermissionTuple.v1(),
indicesPermissionTuple.v2());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like a lot of code to compute a cache key; it's hard to follow and counter-productive to the performance goal of this all.

I think we should work with the fact that the Role, resolvedIndices.getLocal, action and metadata.getIndicesLookup() are immutable (or effectively immutable, maybe assert or test it).
In this case, the cache key class's equals method would use the == test for the corresponding members; analogous with System#identityHashCode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow. Maybe I am missing something, but how is it possible to use == to compare action or resolvedIndices.getLocal? For example, even if two action strings have the same value, they are different objects belong to two different HTTP worker threads?

Comparing identity of metadata.getIndicesLookup() may be viable, but it does not seem to be more convenient or more reliable than comparing a simple version number?

As for Role, the issue is that the roles for API keys are constructed on the fly for each request. So even the same API key will get two equavilent Role objects but with different identities. To get around it, the cacheKey here uses only the IndicesPermission component of a Role and the comparison for IndicesPermission is an object identity.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and promising, and we can maybe later investigate making the cache entries more long lived.

From an implementation point of view, I think we can do better to make the cache key computation simpler, and maybe more isolated from the core logic.

From a performance point of view, I think we could set the concurrencyLevel parameter on the map to the number of transport worker threads, since we are positive every such thread will now do a computeIfAbsent on this map (it's more coupling than I usually like, but I think we have to find a heuristic for it - I don't have good suggestion here - but neither great expectations).

I'll take another look after you manage to simplify the cache key computation a bit.

@ywangd
Copy link
Member Author

ywangd commented Oct 6, 2021

I think this looks good and promising, and we can maybe later investigate making the cache entries more long lived.

From an implementation point of view, I think we can do better to make the cache key computation simpler, and maybe more isolated from the core logic.

With #78327 replaced by the changes in core (#78508), together with #78321, we are in a very good position for Index/Shard level requests, I feel fairly confident to say that we don't need any more optimisation for them in 7.16. Therefore, I'd like to rework this PR so it targets more specifically to REST level requests, or more technically, requests that are Replaceble. It was indeed raised mainly for REST level requests, but it also had index/shard level requests in mind, which is no longer necesesary. I need sometime to think through it first (maybe will ping you for ideas). Thanks!

From a performance point of view, I think we could set the concurrencyLevel parameter on the map to the number of transport worker threads, since we are positive every such thread will now do a computeIfAbsent on this map (it's more coupling than I usually like, but I think we have to find a heuristic for it - I don't have good suggestion here - but neither great expectations).

This is a good point. I'll incorporate it in the rework.

ywangd added 8 commits October 7, 2021 14:21
Other improvements:
1. A single cacheKey for the a single index authorization process
2. Resolving indices is now cached together with load authorized indices
3. Because of above, it now avoid using resolved indices are a member of
   the cache key but instead use the original index expression which
   should be often much smaller.
@ywangd
Copy link
Member Author

ywangd commented Oct 11, 2021

I have reworked this PR based on the recent development and feedback. It is now ready for another around. A few key points:

  • It now focuses on requests that are Replaceable. This is a better compliment given Skip loading authorized indices if requests do not need them #78321 and Filter original indices in shard level request #78508
  • It now caches the entire indexAuthorization instead of trying to caching each component (authorizedIndices, resolvedIndices, indicesAccessControl) separately. I think this is overall better for efficency and simplicity (only a single cacheKey is needed)
  • The listenableFuture now branches to the generic thread pool instead of executing directly (which was wrong)
  • It now uses actual Cache instead of concurrentHashMap for better cache control. The cache can also be disabled now.

Comment on lines +409 to +411
((IndicesRequest.Replaceable) requestInfo.getRequest()).indices(
value.resolvedIndices.isNoIndicesPlaceholder() ? NO_INDICES_OR_ALIASES_ARRAY : value.resolvedIndices.toArray()
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one place where we have a coupling with the code logic in IndicesAndAliasesResolver.resolveIndicesAndAliases. It's not ideal but I don't see a way around it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is necessary because invoking indicesAsyncSupplier has the side effect of replacing the original index expression with the resolved one.

In this case, to be completely honest, I think this is an indication we need to find a different approach to the caching issue.

Maybe we first need to separate the index expression evaluation from the expression replace. Then maybe we can independently cache the index expression evaluation. I don't want to venture and suggest alternatives. Bottom line, is that I do not think this PR, as it stands right now, is the good trade-off between maintainability and performance.

Concretely, I don't think this works for AliasesRequests, but even if you plug this hole this is extremely brittle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works for AliasesRequests

You are 100% correct. I meant to exclude it in isRequestCacheableForIndexAuthorization but totally forgot it in the refactoring process.

Comment on lines +898 to +904
static class IndexAuthorizationCacheKey {
private final String action;
private final String[] requestedIndices;
private final IndicesOptions indicesOptions;
private final long metadataVersion;
// A pair of indicesPermission and limitedByIndicesPermission. The latter is nullable
private final Tuple<IndicesPermission, IndicesPermission> indicesPermissions;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new CacheKey class. The improments are:

  • There is only one CacheKey since we now cache the whole index authorization process instead individual components.
  • It uses requestedIndices which is often times much smaller compared to resolvedIndices.

@albertzaharovits These set of variables seem to be the minimal information required for the cacheKey (please also see my previous reply). Please let me know if you think otherwise.

@ywangd ywangd changed the title ListenableFuture for authorizedIndices and IndicesAccessControl ListenableFuture for authorize index action Oct 11, 2021
if (((IndicesRequest) requestInfo.getRequest()).allowsRemoteIndices()) {
// check action name
indexAuthorizationResult = authorizeIndexActionName(
requestInfo.getAction(), authorizationInfo, IndicesAccessControl.ALLOW_NO_INDICES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is needed.
The original code, without the caching, invoked the authorizeIndexActionName before the resolvedIndices are resolved, which I see you've maintained, but then still added another such check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request allows remote indices, the checking is more complex (which is yet another issue with the CCS security model).
If there are remote indices then we can't rely on whether the action is authorized locally, because it's allowable to perform a cross cluster search even if you don't have access to search any indices in the local cluster.
I'm not sure whether the check is right (it seems to only perform the action name check if there a no indices, but I would think that we want to perform it if there are no remote indices), but it's different for a reason.

I think the logic behind doing it if there are no indices, is that this is the only place where we can trigger an exception (because index checking won't happen).
But if there are any indices, we'll also check whether each of those indices is allowed, and fail if they're not (locally for local indices, or on the remote cluster for remote indices).

For non-remoteable actions, the upfront action name check is a performance optimisation - no need to check indices for an action you can't perform. For remoteable actions, we have a backstop for when there are no indices to check.

Copy link
Member Author

@ywangd ywangd Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is needed.
The original code, without the caching, invoked the authorizeIndexActionName before the resolvedIndices are resolved, which I see you've maintained, but then still added another such check here.

It is needed. Because the caching is for both when the request allowsRemoteIndices and when it does not. The later one invokes authorizeIndexActionName, but the former one does not. This code covers both of them and moves the allowsRemoteIndices check inside this method.

If the request allows remote indices, the checking is more complex ...

This piece of code is supposed to be a pure refactoring (other than the caching part). If you look past the caching stuff, the logic is exactly the same as the existing code (unless I didn't do a good job in refactoring).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look past the caching stuff, the logic is exactly the same as the existing code

My comment was in response to Albert's concerns, and was intended to explain why the code (existing and refactored) needs to have special cases for these situations.

indicesRequest.indices(),
indicesRequest.indicesOptions(),
metadata.version(),
getRoleIndicesPermissions(ensureRBAC(authorizationInfo).getRole())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the permissions to know if we can cache the request's authorization result?
Can't we use the role or limited role instances with an equality check, since those instances are cached?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitedRole isn't cached.
We cache the limited-by, but not the constructed role.

Perhaps we should.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acknowledge the amount of effort here, but I worry about the added complexity in deciding when to cache and what to do with a cached value.

The complexity will make it even more nightmarish to maintain, and I don't think this overall is a great performance boost, because we now introduce atomics and forking on thread pools which, for easy-to-authorize requests it actually amounts to a performance penalty.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is similar to @albertzaharovits's here. This introduces a fair bit more complexity into code that is already far too complex to safely deal with.
What sort of performance gain do we expect here? It's hard to evaluate the tradeoff without some real data.

this.indexAuthorizationCache =
CacheBuilder.<IndexAuthorizationCacheKey, ListenableFuture<IndexAuthorizationCacheValue>>builder()
.setMaximumWeight(cacheSize)
.setExpireAfterWrite(TimeValue.timeValueMinutes(5)) // a fallback just in case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't really mean anything in context.
It's based on the undocumented (at this point) condition that things are automatically cleared from the cache when the listener completes, but that isn't clear in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have improved the comment which should make more sense now.

Map<String, IndexAbstraction> aliasOrIndexLookup,
ActionListener<IndexAuthorizationResult> listener
) {
assert false : "this method should no longer be used";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method exist if it's not used? Just delete it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the existing method from the AuthorizationEngine interface. It cannot be deleted because the interface mandates it and there is no good default the interface can provide. This is a way to retire this method without breaking custom authz engines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about breaking custom engines, just delete what needs to be deleted. It's OK to change the interface for these sorts of plugins.

if (((IndicesRequest) requestInfo.getRequest()).allowsRemoteIndices()) {
// check action name
indexAuthorizationResult = authorizeIndexActionName(
requestInfo.getAction(), authorizationInfo, IndicesAccessControl.ALLOW_NO_INDICES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request allows remote indices, the checking is more complex (which is yet another issue with the CCS security model).
If there are remote indices then we can't rely on whether the action is authorized locally, because it's allowable to perform a cross cluster search even if you don't have access to search any indices in the local cluster.
I'm not sure whether the check is right (it seems to only perform the action name check if there a no indices, but I would think that we want to perform it if there are no remote indices), but it's different for a reason.

I think the logic behind doing it if there are no indices, is that this is the only place where we can trigger an exception (because index checking won't happen).
But if there are any indices, we'll also check whether each of those indices is allowed, and fail if they're not (locally for local indices, or on the remote cluster for remote indices).

For non-remoteable actions, the upfront action name check is a performance optimisation - no need to check indices for an action you can't perform. For remoteable actions, we have a backstop for when there are no indices to check.

private Tuple<IndicesPermission, IndicesPermission> getRoleIndicesPermissions(Role role) {
final IndicesPermission indicesPermission;
final IndicesPermission limitedByIndicesPermission;
if (role instanceof LimitedRole) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think RBAC engine should have code that specifically handles LimitedRole. We need to find a way to encapsulate this properly.

indicesRequest.indices(),
indicesRequest.indicesOptions(),
metadata.version(),
getRoleIndicesPermissions(ensureRBAC(authorizationInfo).getRole())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitedRole isn't cached.
We cache the limited-by, but not the constructed role.

Perhaps we should.

*/
private boolean isRequestCacheableForIndexAuthorization(TransportRequest request) {
return request instanceof IndicesRequest.Replaceable
&& (false == request instanceof PutMappingRequest || ((PutMappingRequest) request).getConcreteIndex() == null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand this into a series of ifs? It's almost impossible to follow nested && and || conditions like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I amended it as you suggested.

@ywangd
Copy link
Member Author

ywangd commented Oct 13, 2021

Thanks for taking time looking into this PR. I recongise it is a complex and likely controvesial change. Realistically, I don't think it is something that we can accomodate for 7.16 (if at all).

To me, the most glaring issue with this PR is that we don't have associated benchmarks to either support or disapprove it. This plus the level of complexity makes it difficult to justify and I acknowledge that. Nevertheless, I think it is still useful to serve as a strawman proposal and hopefully better ideas would emerge through discussions. So please let me spend some time to reply your major comments in this single post for (perhaps) easier digestion.

The change has its root from my dormant PoC for caching authorizedIndices and was then reinvigorated after a discussion with Armin about potential transport_worker congestion due to high load on the REST level. As the word "potential" indicates, we don't really have any proof (SDH or benchmark) for the issue. Or even when that is truly a problem, it is still unknown whether authZ is indeed a big contributor to the total response time. Anyway the goal is to maximize the throughput of transport_worker threads. So the ideal scenario and improvements that I had in mind are something like the follows:

  • Multiple REST requests need to be handled simultaneously and each is assigned a transport_worker thread
  • These requests incur expensive but identical authZ computation. Therefore the authZ result can be shared among these threads.
  • If we let only a single transport_worker thread to do the actual computation and free other transport_worker threads. We improve both the throughput and stability of the cluster. This reasoning can be backed by my previous benchmark which shows a saturated transport_worker thread pool is way from actually saturates the CPU (utilisation is only about 15%). So switching to the generic thread pool while waiting for the authZ result serves like a "just-in-time" concurrency boost for certain requests because generic thread pool can burst up to 100% CPU usage. We tested a similar strategy for Transport level congestion and it helped a great deal.

This PR is hence an implementation for the above idea. Now onto the comments ...


Comments from Albert

I don't think this overall is a great performance boost, because we now introduce atomics and forking on thread pools which, for easy-to-authorize requests it actually amounts to a performance penalty.

First I admit there is no benchmarking to prove any performance gain. But I think performance penalty is unlikely, the caching mechansim is only triggered for a very specific situation, (1) multiple requests coming in the same time; (2) with identical authZ conditions; (3) and authZ needs to be expensive enough so that one request starts authZ before another can finish authZ. If the request is easy-to-authorize, it's likely that the caching won't even trigger. Again, without actual benchmarking, all these are just my best guess.

I understand that this is necessary because invoking indicesAsyncSupplier has the side effect of replacing the original index expression with the resolved one.
Maybe we first need to separate the index expression evaluation from the expression replace. Then maybe we can independently cache the index expression evaluation.

This is the fundamental issue: we are trying to cache a non-pure function and that's why the code logic is leaking out. This is why the previous version had two separate caches, one for loadAuthorizedIndices and the other for buildIndicesAccessControl, which are both pure functions. The previous version avoids this side-effect problem. But it is also less effective because:

  • resolveIndicesAndAliases is a larger part of the total authZ. For 10K indices and request of a single *, the ratio is about 3:7:12 for loadAuthorizedIndices, buildIndicesAccessControl and resolveIndicesAndAliases.
  • Multiple caches for different authZ component also means we may have to fork thread multiple times (multiple ListenableFuture) for a single request and it is unlikely to be efficient. Btw, previous version did not fork thread pool and effectively uses SAME (direct) thread pool which is wrong. Because this means the same transport_worker thread that does the authZ computation would also do all the work of those threads that registered a listener. This would potentially block this single thread even longer which means potential cluster instability.

It's a tough one to crack: Side effect and caching just don't play well and multiple caches is unlikely a good idea.


Comments from Tim

What sort of performance gain do we expect here? It's hard to evaluate the tradeoff without some real data.

I think I answered this question in previous section and I acknowledge that there is no data to support it.

Overall, I think the idea still has its value. But two things need to be addressed:

  1. Actual benchmark to either validate or invalidate it
  2. A less compromised and safer implementation

I am interested in spending sometime on the above items in the 8.x timeframe. Also maybe along the way, we can discover better ideas (but at least item 1 will not be a waste).

@tvernum
Copy link
Contributor

tvernum commented Oct 14, 2021

we now introduce atomics and forking on thread pools which, for easy-to-authorize requests it actually amounts to a performance penalty.

If the request is easy-to-authorize, it's likely that the caching won't even trigger.

But the cache management code (like the AtomicBoolean to check whether an entry exists in the cache) does still trigger. For a hypothetical node where the caching is never useful (because it literally never has concurrent requests over the same indices) this code will be slower than the old node because the cache exists & needs to be managed, but adds no value.
Is that 1% slower? or 10% ? 0.1%?
I suspect it's not going to be much, but given we could drop double digit % from some code just by avoiding collection resizing (which was not an obvious problem from reading the code) it's a question that we would want to have an answer for - what % slow down in "normal" request patterns is acceptable to optimize the "slow and duplicated" ones?

the ratio is about 3:7:12 for loadAuthorizedIndices, buildIndicesAccessControl and resolveIndicesAndAliases

This is where I think tradeoff questions come into play. If we can find a way to cache resolveIndicesAndAliases in a way that is less complex than the proposal we have here, then that might be a win. It could be a 50% improvement, for less code complexity.
Maybe there is no such option - but I think the conclusion from the attempting you've made to add caching here is that it's hard to do in a way that we're comfortable with and we need to explore other options.

@ywangd
Copy link
Member Author

ywangd commented Oct 20, 2021

Closing this as we agreed that it is not an overall good tradeoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants