-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Move TransportGetLicenseAction to SAME Threadpool #80993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move TransportGetLicenseAction to SAME Threadpool #80993
Conversation
This is motivated by a number of recent SDHs that had these transport actions queue up on the manangement pool. These were not the reason for the blockage on the managment queue, but they are often sent at a high rate by Beats in the same scenarios that see a high rate of stats requests from Beats. Moving them off of the management pool at least makes sure that we don't get Beats retrying them over and over on slowness and generally saves some resources by avoiding ctx switches and having these requests live for longer than necessary. There's no point in running this on the management pool. It should have already been fast enough for SAME with the exception of reading the public key from disk maybe. Made it so the public key is just a constant and doesn't have to be read+deserialized over and over and also cached the verified property for a `License` instance so it should never have to be computed in practice anyway.
|
Pinging @elastic/es-security (Team:Security) |
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The change of loading the public key into a static variable means that the node will fail to launch if anything goes wrong in that process. This might be a behaviour change in some cases. But I don't think it's an issue since it should not happen at all in normal conditions. When the public key does get corrupted, it is probably better to fail eariler and louder. It also won't "hot-reload" the public key anymore. But I don't think we intended to support that anyway.
| throw new IllegalStateException(ex); | ||
| PUBLIC_KEY = CryptUtils.readPublicKey(out.toByteArray()); | ||
| } catch (IOException e) { | ||
| throw new AssertionError("key file is part of the source and must deserialize correctly", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The error message is rather cryptic from end-user's perspective and is not really actionable. Should we change it to something along the line of: "The public key file for license verification seems to be corrupted. Please ensure the integrity of the installation files."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, this isn't even a outright file in the packaged release, the user can't run into this unless they compiled their own version (checksums and just general consistency checking on the jar/zip make it impossible to selectively corrupt this) => I'd just stick with this message that explains what's going on here for "us" when we read the code, the user won't be seeing this ever.
|
Thanks Yang! |
This is motivated by a number of recent SDHs that had these transport actions queue up on the manangement pool. These were not the reason for the blockage on the managment queue, but they are often sent at a high rate by Beats in the same scenarios that see a high rate of stats requests from Beats. Moving them off of the management pool at least makes sure that we don't get Beats retrying them over and over on slowness and generally saves some resources by avoiding ctx switches and having these requests live for longer than necessary. There's no point in running this on the management pool. It should have already been fast enough for SAME with the exception of reading the public key from disk maybe. Made it so the public key is just a constant and doesn't have to be read+deserialized over and over and also cached the verified property for a `License` instance so it should never have to be computed in practice anyway.
💚 Backport successful
|
This is motivated by a number of recent SDHs that had these transport actions queue up on the manangement pool. These were not the reason for the blockage on the managment queue, but they are often sent at a high rate by Beats in the same scenarios that see a high rate of stats requests from Beats. Moving them off of the management pool at least makes sure that we don't get Beats retrying them over and over on slowness and generally saves some resources by avoiding ctx switches and having these requests live for longer than necessary. There's no point in running this on the management pool. It should have already been fast enough for SAME with the exception of reading the public key from disk maybe. Made it so the public key is just a constant and doesn't have to be read+deserialized over and over and also cached the verified property for a `License` instance so it should never have to be computed in practice anyway.
This is motivated by a number of recent SDHs that had these transport actions
queue up on the manangement pool. These were not the reason for the blockage on
the managment queue, but they are often sent at a high rate by Beats in the same
scenarios that see a high rate of stats requests from Beats.
Moving them off of the management pool at least makes sure that we don't get Beats
retrying them over and over on slowness and generally saves some resources by
avoiding ctx switches and having these requests live for longer than necessary.
There's no point in running this on the management pool. It should have
already been fast enough for SAME with the exception of reading the public key
from disk maybe. Made it so the public key is just a constant and doesn't have
to be read+deserialized over and over and also cached the verified property for
a
Licenseinstance so it should never have to be computed in practice anyway.relates #77466 to some degree, since large CS => slow stats tasks on
MANAGEMENT=> these tasks queue up