-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] inference performance optimizations and refactor #57674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] inference performance optimizations and refactor #57674
Conversation
|
Pinging @elastic/ml-core (:ml) |
|
Some numbers from JMH. These were gathered by running inference against a larger tree ensemble model. Each measurement is a single inference call against data with various fields missing and various expected results. Before optimizations: After optimizations: |
|
run elasticsearch-ci/bwc |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a major refactor of the underlying inference logic. The main refactor is now we are separating the model configuration and the inference interfaces. This has the following benefits: - we can store extra things with the model that are not necessary for inference (i.e. treenode split information gain) - we can optimize inference separate from model serialization and storage. - The user is oblivious to the optimizations (other than seeing the benefits). A major part of this commit is removing all inference related methods from the trained model configurations (ensemble, tree, etc.) and moving them to a new class. This new class satisfies a new interface that is ONLY for inference. The optimizations applied currently are: - feature maps are flattened once - feature extraction only happens once at the highest level (improves inference + feature importance through put) - Only storing what we need for inference + feature importance on heap
This is a major refactor of the underlying inference logic. The main refactor is now we are separating the model configuration and the inference interfaces. This has the following benefits: - we can store extra things with the model that are not necessary for inference (i.e. treenode split information gain) - we can optimize inference separate from model serialization and storage. - The user is oblivious to the optimizations (other than seeing the benefits). A major part of this commit is removing all inference related methods from the trained model configurations (ensemble, tree, etc.) and moving them to a new class. This new class satisfies a new interface that is ONLY for inference. The optimizations applied currently are: - feature maps are flattened once - feature extraction only happens once at the highest level (improves inference + feature importance through put) - Only storing what we need for inference + feature importance on heap
This is a major refactor of the underlying inference logic.
The main refactor is now we are separating the model configuration and
the inference interfaces.
This has the following benefits:
necessary for inference (i.e. treenode split information gain)
A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.
This new class satisfies a new interface that is ONLY for inference.
The optimizations applied currently are:
(improves inference + feature importance through put)