-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Better memory estimation for NLP models #568
Conversation
@qherreros , since I ported your code for memory estimation, would you mind looking at the relevant part in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a minor comment.
5647287
to
f618672
Compare
@davidkyle could you please look at the PR since you have the best overview of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please add an assertion to the model config creation tests that these new settings are present and have sensible values
Thank you for the review @davidkyle . I addressed your comments, it would be great if you could take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pquentin do you know why the read the docs build has started failing I notice the build is using Python 3.12.0 which could have something to do with it
|
Yes, we were using the latest Python version supported by Read the Docs and it recently became Python 3.12. It was failing to build numpy because we pin it to an older version that does not support Python 3.12. #627 fixes this by asking Python 3.10 explicitly. |
Thanks for fixing the docs build @pquentin |
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses
torch.profiler
and performs logs the peak memory usage during the inference.This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).