[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018

rholinshead · 2024-01-25T00:33:14Z

[hf inference] Implement HuggingFaceImage2TextRemoteInference

Implement HuggingFaceImage2TextRemoteInference for running hf image-to-text models via inference API for gradio. The api takes in an image of various supported types: Union[str, Path, bytes, BinaryIO]. For now, just implementing support for path and uri since it's not needed for gradio and my python skills aren't great.

Testing:

Build/install the local hugging face package with these changes

(hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % cd extensions/HuggingFace
pip3 install build && cd python && python -m build && pip3 install dist/*.whl
pip3 install -e .

Register these parsers in /Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py then run aiconfig editor with the local parsers and gradio config:

aiconfig_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json

(hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % parsers_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py


(hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % aiconfig edit --aiconfig-path=$aiconfig_path --server-mode=debug_servers --parsers-module-path=$parsers_path

Ensure the model works in the config. Ensure changing settings works & is persisted to aiconfig.
Test setting fake model ('test') propagates expected error
Test setting invalid api_token in the InferenceOptions in run server-side propagates expected error
Set to my own key and ensure execution works

Stack created with Sapling. Best reviewed with ReviewStack.

Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`

Ankush-lastmile · 2024-01-25T03:54:23Z

...ggingFace/python/src/aiconfig_extension_hugging_face/remote_inference_client/image_2_text.py

+            )
+        )
+
+        # Translation api doesn't support stream


nit: can remove

Ankush-lastmile · 2024-01-25T03:54:57Z

...ggingFace/python/src/aiconfig_extension_hugging_face/remote_inference_client/image_2_text.py

+
+            # HuggingFace image_to_text outputs should only ever be string
+            # format so shouldn't get here, but just being safe
+            return json.dumps(output_data, indent=2)


if its not a string it might make more sense to raise an exception here instead

Ankush-lastmile

lgtm, surprising that the api only takes in one image input.

Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`

# [hf inference] Implement HuggingFaceImage2TextRemoteInference Implement `HuggingFaceImage2TextRemoteInference` for running hf image-to-text models via inference API for gradio. The api takes in an image of various supported types: `Union[str, Path, bytes, BinaryIO]`. For now, just implementing support for path and uri since it's not needed for gradio and my python skills aren't great. <img width="1323" alt="Screenshot 2024-01-24 at 7 31 23 PM" src="https://github.com/lastmile-ai/aiconfig/assets/5060851/3581191f-3295-4dc0-b455-d2e613179639"> ## Testing: Build/install the local hugging face package with these changes ``` (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % cd extensions/HuggingFace pip3 install build && cd python && python -m build && pip3 install dist/*.whl pip3 install -e . ``` Register these parsers in `/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py` then run aiconfig editor with the local parsers and gradio config: ``` aiconfig_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % parsers_path=/Users/ryanholinshead/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py (hf) ryanholinshead@Ryans-MacBook-Pro aiconfig % aiconfig edit --aiconfig-path=$aiconfig_path --server-mode=debug_servers --parsers-module-path=$parsers_path ``` - Ensure the model works in the config. Ensure changing settings works & is persisted to aiconfig. - Test setting fake model ('test') propagates expected error - Test setting invalid api_token in the InferenceOptions in run server-side propagates expected error - Set to my own key and ensure execution works

rholinshead · 2024-01-25T15:29:51Z

Changes from review:

remove copy-pasted comment about translation api
raise ValueError if output data type isn't string, instead of silently failing by dumping to json

HuggingFaceImage2TextRemoteInferencePromptSchema # HuggingFaceImage2TextRemoteInferencePromptSchema From https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L731. Only 'model' is supported in settings. For input, multiple image types are supported (but only one image), so using `image/*` mimetype to support all subtypes <img width="1323" alt="Screenshot 2024-01-24 at 7 31 23 PM" src="https://github.com/lastmile-ai/aiconfig/assets/5060851/73891022-1270-45f5-a888-69afca3651cc"> --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1019). * __->__ #1019 * #1018

Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`

[hf inference] ASR remote inference model parser impl Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`

rholinshead mentioned this pull request Jan 25, 2024

HuggingFaceImage2TextRemoteInferencePromptSchema #1019

Merged

rholinshead marked this pull request as ready for review January 25, 2024 00:45

rholinshead requested review from saqadri, suyoglastmileai, Ankush-lastmile, jonathanlastmileai and rossdanlm as code owners January 25, 2024 00:45

Ankush-lastmile mentioned this pull request Jan 25, 2024

[hf inference] ASR remote inference model parser impl #1020

Merged

Ankush-lastmile reviewed Jan 25, 2024

View reviewed changes

Ankush-lastmile approved these changes Jan 25, 2024

View reviewed changes

rholinshead force-pushed the pr1018 branch from 8895a2c to b9ae489 Compare January 25, 2024 15:28

rholinshead merged commit b9ae489 into main Jan 25, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018

[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018

rholinshead commented Jan 25, 2024 •

edited

Loading

Ankush-lastmile Jan 25, 2024

Ankush-lastmile Jan 25, 2024 •

edited

Loading

Ankush-lastmile left a comment

rholinshead commented Jan 25, 2024

[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018

[hf inference] Implement HuggingFaceImage2TextRemoteInference #1018

Conversation

rholinshead commented Jan 25, 2024 • edited Loading

[hf inference] Implement HuggingFaceImage2TextRemoteInference

Testing:

Ankush-lastmile Jan 25, 2024

Choose a reason for hiding this comment

Ankush-lastmile Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Ankush-lastmile left a comment

Choose a reason for hiding this comment

rholinshead commented Jan 25, 2024

rholinshead commented Jan 25, 2024 •

edited

Loading

Ankush-lastmile Jan 25, 2024 •

edited

Loading