[ML] Better memory estimation for NLP models #568

valeriy42 · 2023-07-20T11:02:26Z

This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.

This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).

valeriy42 · 2023-07-20T11:12:40Z

@qherreros , since I ported your code for memory estimation, would you mind looking at the relevant part in transformers.py?

qherreros

LGTM! Just a minor comment.

eland/ml/pytorch/transformers.py

valeriy42 · 2023-11-02T15:38:14Z

@davidkyle could you please look at the PR since you have the best overview of the transformers.py. 🙏

davidkyle

Looks good. Please add an assertion to the model config creation tests that these new settings are present and have sensible values

https://github.com/elastic/eland/blob/main/tests/ml/pytorch/test_pytorch_model_config_pytest.py#L149

.buildkite/run-elasticsearch.sh

eland/ml/pytorch/transformers.py

tests/ml/pytorch/test_pytorch_model_upload_pytest.py

valeriy42 · 2023-11-03T13:34:39Z

Thank you for the review @davidkyle . I addressed your comments, it would be great if you could take another look.

davidkyle

LGTM

davidkyle · 2023-11-03T13:56:02Z

@pquentin do you know why the read the docs build has started failing

I notice the build is using Python 3.12.0 which could have something to do with it

Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      Traceback (most recent call last):
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
          backend = _build_backend()
                    ^^^^^^^^^^^^^^^^
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
          obj = import_module(mod_path)
                ^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/docs/.asdf/installs/python/3.12.0/lib/python3.12/importlib/__init__.py", line 90, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1381, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1304, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "<frozen importlib._bootstrap>", line 1381, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
        File "<frozen importlib._bootstrap_external>", line 994, in exec_module
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

pquentin · 2023-11-05T18:33:03Z

@pquentin do you know why the read the docs build has started failing

I notice the build is using Python 3.12.0 which could have something to do with it

Yes, we were using the latest Python version supported by Read the Docs and it recently became Python 3.12. It was failing to build numpy because we pin it to an older version that does not support Python 3.12. #627 fixes this by asking Python 3.10 explicitly.

davidkyle · 2023-11-06T09:21:58Z

Thanks for fixing the docs build @pquentin

valeriy42 changed the title ~~Nlp-memory-estimation-2~~ [ML] Better memory estimation for NLP models Jul 20, 2023

valeriy42 mentioned this pull request Jul 20, 2023

[ML] Better memory estimation for NLP models #567

Closed

valeriy42 requested a review from qherreros July 20, 2023 11:11

qherreros approved these changes Jul 20, 2023

View reviewed changes

eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved

valeriy42 marked this pull request as ready for review August 7, 2023 13:45

valeriy42 self-assigned this Oct 26, 2023

valeriy42 added 7 commits November 2, 2023 14:40

memory estimation works

ea9f9a5

refactoring

50a0909

fix buildkite issues

3b635e5

formatting

969fefa

fix missing import

f2d19dd

extend code comments

c6dd202

Change field names

f618672

valeriy42 force-pushed the nlp-memory-estimation-2 branch from 5647287 to f618672 Compare November 2, 2023 13:45

valeriy42 added 2 commits November 2, 2023 14:54

formatting

02763b3

fix mypy errors

652eb05

valeriy42 requested a review from davidkyle November 2, 2023 15:37

davidkyle reviewed Nov 3, 2023

View reviewed changes

.buildkite/run-elasticsearch.sh Show resolved Hide resolved

eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved

tests/ml/pytorch/test_pytorch_model_upload_pytest.py Outdated Show resolved Hide resolved

valeriy42 added 2 commits November 3, 2023 14:26

fix test failures and address review comments

79bd467

remove dead code and unused imports

cce26c4

davidkyle approved these changes Nov 3, 2023

View reviewed changes

valeriy42 merged commit 6cecb45 into elastic:main Nov 6, 2023

valeriy42 deleted the nlp-memory-estimation-2 branch November 6, 2023 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Better memory estimation for NLP models #568

[ML] Better memory estimation for NLP models #568

valeriy42 commented Jul 20, 2023 •

edited

Loading

valeriy42 commented Jul 20, 2023

qherreros left a comment

valeriy42 commented Nov 2, 2023

davidkyle left a comment

valeriy42 commented Nov 3, 2023

davidkyle left a comment

davidkyle commented Nov 3, 2023

pquentin commented Nov 5, 2023

davidkyle commented Nov 6, 2023

[ML] Better memory estimation for NLP models #568

[ML] Better memory estimation for NLP models #568

Conversation

valeriy42 commented Jul 20, 2023 • edited Loading

valeriy42 commented Jul 20, 2023

qherreros left a comment

Choose a reason for hiding this comment

valeriy42 commented Nov 2, 2023

davidkyle left a comment

Choose a reason for hiding this comment

valeriy42 commented Nov 3, 2023

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle commented Nov 3, 2023

pquentin commented Nov 5, 2023

davidkyle commented Nov 6, 2023

valeriy42 commented Jul 20, 2023 •

edited

Loading