-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds PyTorch model management #394
Conversation
PS. I'm happy to just move this work into a branch in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me.
But, I am no Pythonista. Would be good to get a review from @sethmlarson
@joshdevins I like your idea of developing this on a branch, I think we can create a branch right off of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting! I left a whole bunch of review comments. a high level review comment is you can run nox -rs format
(or black eland/
will do a lot too) to automatically format code to be compliant with the linter. You'll also get type-checking as well.
bin/upload_hub_model.py
Outdated
@@ -0,0 +1,80 @@ | |||
#!venv/bin/python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the intent of this file is, if it's intended to be "installed" on the users system after running python -m pip install eland
then we'll need to structure and configure this much differently. After knowing what the intent is can review this more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its an example of how to download a model from hugging face and then deploy it. Its a useful script that the user can run.
I don't think there is a plan to have it installed on the users system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this was a copy-paste from the old script so it's incorrect.
I discussed this a couple months ago with @sethmlarson but to repeat for a wider audience, we should be providing scripts for the most common use-cases so people don't need to write a notebook or their own script. The original intention is indeed that this is installed (on the system/in a venv) with pip install eland[pytorch]
. Happy to restructure as needed. Also open to any other ideas for how this script should run though.
eland/ml/pytorch/pytorch_model.py
Outdated
|
||
return True | ||
|
||
def upload(self, model_path: str, config_path: str, vocab_path: str, chunk_size: int = DEFAULT_CHUNK_SIZE) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any cleanup we need to do if one part of the upload process passes and the next fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitris-athanasiou ^^ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshdevins @sethmlarson, calling delete _ml/trained_models/<model_id>
will clean up the bad state. The config will have to be pushed again.
else: | ||
ignorables = () | ||
|
||
return self._client.transport.perform_request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to distinguish between the "model deployment is already stopped" and "model isn't found" case? If so might be good to use that here and ignore the "already stopped" case unconditionally and always reraise the 404 model not found case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent ^^ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, _stop
model deployment doesn't throw when the model is missing or when the deployment is already stopped.
Ok, let me go through the comments here and I will get this into a cleaner state. @sethmlarson can you create a new, empty branch ( |
@joshdevins The branch has been created and updated the PR base. |
Will continue to iterate in more PR's once this work is on the new branch
Adding PyTorch model support through two main mechanisms: (1) Generic PyTorch model management in Elasticsearch (2) Interoperability with HuggingFace transformers and model hub This change aims to provide the foundations of this support and all of the basic functionality that one would need to get up and running. Note that this functionality is available in Elasticsearch 8.x builds only.
This is a rough (and working) draft of porting PyTorch code into eland. This is the general shape of what I'm thinking of. It splits concerns into two - (1) generic PyTorch model management in Elasticsearch, and (2) interoperability with HuggingFace transformers and model hub.
Help and comments are welcome. There's surely a lot of formatting and general Python-ification that needs to happen still so help is welcome.