Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds PyTorch model management #394

Merged
merged 9 commits into from
Sep 29, 2021
Merged

Adds PyTorch model management #394

merged 9 commits into from
Sep 29, 2021

Conversation

joshdevins
Copy link
Member

@joshdevins joshdevins commented Sep 27, 2021

This is a rough (and working) draft of porting PyTorch code into eland. This is the general shape of what I'm thinking of. It splits concerns into two - (1) generic PyTorch model management in Elasticsearch, and (2) interoperability with HuggingFace transformers and model hub.

Help and comments are welcome. There's surely a lot of formatting and general Python-ification that needs to happen still so help is welcome.

@joshdevins joshdevins added enhancement New feature or request topic:NLP Issue or PR about NLP model support and eland_import_hub_model labels Sep 27, 2021
@joshdevins joshdevins self-assigned this Sep 27, 2021
@joshdevins
Copy link
Member Author

PS. I'm happy to just move this work into a branch in eland so we can collaborate more easily on this more easily. I suspect it will take some work to get it ready.

@joshdevins joshdevins marked this pull request as ready for review September 27, 2021 15:39
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me.

But, I am no Pythonista. Would be good to get a review from @sethmlarson

@sethmlarson
Copy link
Contributor

@joshdevins I like your idea of developing this on a branch, I think we can create a branch right off of main and then change the target branch for this PR without much fuss. Reviewing this PR now.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exciting! I left a whole bunch of review comments. a high level review comment is you can run nox -rs format (or black eland/ will do a lot too) to automatically format code to be compliant with the linter. You'll also get type-checking as well.

@@ -0,0 +1,80 @@
#!venv/bin/python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the intent of this file is, if it's intended to be "installed" on the users system after running python -m pip install eland then we'll need to structure and configure this much differently. After knowing what the intent is can review this more carefully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its an example of how to download a model from hugging face and then deploy it. Its a useful script that the user can run.

I don't think there is a plan to have it installed on the users system.

Copy link
Member Author

@joshdevins joshdevins Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this was a copy-paste from the old script so it's incorrect.

I discussed this a couple months ago with @sethmlarson but to repeat for a wider audience, we should be providing scripts for the most common use-cases so people don't need to write a notebook or their own script. The original intention is indeed that this is installed (on the system/in a venv) with pip install eland[pytorch]. Happy to restructure as needed. Also open to any other ideas for how this script should run though.


return True

def upload(self, model_path: str, config_path: str, vocab_path: str, chunk_size: int = DEFAULT_CHUNK_SIZE) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any cleanup we need to do if one part of the upload process passes and the next fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshdevins @sethmlarson, calling delete _ml/trained_models/<model_id> will clean up the bad state. The config will have to be pushed again.

else:
ignorables = ()

return self._client.transport.perform_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to distinguish between the "model deployment is already stopped" and "model isn't found" case? If so might be good to use that here and ignore the "already stopped" case unconditionally and always reraise the 404 model not found case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent ^^ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, _stop model deployment doesn't throw when the model is missing or when the deployment is already stopped.

@joshdevins
Copy link
Member Author

joshdevins commented Sep 28, 2021

Ok, let me go through the comments here and I will get this into a cleaner state. @sethmlarson can you create a new, empty branch (pytorch or whatever) and I can change the target of this PR then.

@joshdevins joshdevins changed the title Initial commit of PyTorch support in eland Adds PyTorch model management Sep 28, 2021
@sethmlarson sethmlarson changed the base branch from main to pytorch September 28, 2021 15:07
@sethmlarson
Copy link
Contributor

@joshdevins The branch has been created and updated the PR base.

@joshdevins joshdevins dismissed sethmlarson’s stale review September 29, 2021 07:47

Will continue to iterate in more PR's once this work is on the new branch

@joshdevins joshdevins merged commit 2e7a393 into elastic:pytorch Sep 29, 2021
@joshdevins joshdevins deleted the pytorch-model-upload branch September 29, 2021 07:56
joshdevins added a commit that referenced this pull request Sep 29, 2021
Adding PyTorch model support through two main mechanisms:
(1) Generic PyTorch model management in Elasticsearch
(2) Interoperability with HuggingFace transformers and model hub

This change aims to provide the foundations of this support and all of the basic functionality that one would need to get up and running. Note that this functionality is available in Elasticsearch 8.x builds only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request topic:NLP Issue or PR about NLP model support and eland_import_hub_model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants