-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018
Conversation
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
) | ||
) | ||
|
||
# Translation api doesn't support stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can remove
|
||
# HuggingFace image_to_text outputs should only ever be string | ||
# format so shouldn't get here, but just being safe | ||
return json.dumps(output_data, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its not a string it might make more sense to raise an exception here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, surprising that the api only takes in one image input.
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
# [hf inference] Implement HuggingFaceImage2TextRemoteInference Implement `HuggingFaceImage2TextRemoteInference` for running hf image-to-text models via inference API for gradio. The api takes in an image of various supported types: `Union[str, Path, bytes, BinaryIO]`. For now, just implementing support for path and uri since it's not needed for gradio and my python skills aren't great. <img width="1323" alt="Screenshot 2024-01-24 at 7 31 23 PM" src="https://github.com/lastmile-ai/aiconfig/assets/5060851/3581191f-3295-4dc0-b455-d2e613179639"> ## Testing: Build/install the local hugging face package with these changes ``` (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % cd extensions/HuggingFace pip3 install build && cd python && python -m build && pip3 install dist/*.whl pip3 install -e . ``` Register these parsers in `/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py` then run aiconfig editor with the local parsers and gradio config: ``` aiconfig_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % parsers_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % aiconfig edit --aiconfig-path=$aiconfig_path --server-mode=debug_servers --parsers-module-path=$parsers_path ``` - Ensure the model works in the config. Ensure changing settings works & is persisted to aiconfig. - Test setting fake model ('test') propagates expected error - Test setting invalid api_token in the InferenceOptions in run server-side propagates expected error - Set to my own key and ensure execution works
Changes from review:
|
HuggingFaceImage2TextRemoteInferencePromptSchema # HuggingFaceImage2TextRemoteInferencePromptSchema From https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L731. Only 'model' is supported in settings. For input, multiple image types are supported (but only one image), so using `image/*` mimetype to support all subtypes <img width="1323" alt="Screenshot 2024-01-24 at 7 31 23 PM" src="https://github.com/lastmile-ai/aiconfig/assets/5060851/73891022-1270-45f5-a888-69afca3651cc"> --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1019). * __->__ #1019 * #1018
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
[hf inference] ASR remote inference model parser impl Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
[hf inference] Implement HuggingFaceImage2TextRemoteInference
[hf inference] Implement HuggingFaceImage2TextRemoteInference
Implement
HuggingFaceImage2TextRemoteInference
for running hf image-to-text models via inference API for gradio. The api takes in an image of various supported types:Union[str, Path, bytes, BinaryIO]
. For now, just implementing support for path and uri since it's not needed for gradio and my python skills aren't great.Testing:
Build/install the local hugging face package with these changes
Register these parsers in
/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
then run aiconfig editor with the local parsers and gradio config:Stack created with Sapling. Best reviewed with ReviewStack.