-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hf inference] ASR remote inference model parser impl #1020
Conversation
a91c116
to
4d6e47c
Compare
4d6e47c
to
7574133
Compare
7574133
to
3a8d37c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes, mostly from copy/paste
.../src/aiconfig_extension_hugging_face/remote_inference_client/automatic_speech_recognition.py
Outdated
Show resolved
Hide resolved
.../src/aiconfig_extension_hugging_face/remote_inference_client/automatic_speech_recognition.py
Outdated
Show resolved
Hide resolved
.../src/aiconfig_extension_hugging_face/remote_inference_client/automatic_speech_recognition.py
Outdated
Show resolved
Hide resolved
.../src/aiconfig_extension_hugging_face/remote_inference_client/automatic_speech_recognition.py
Outdated
Show resolved
Hide resolved
if len(inputs) > 1: | ||
raise ValueError( | ||
f"Multiple audio inputs are not supported for the HF Automatic Speech Recognition Inference api. Please specify a single audio input attachment." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, that's what I was thinking. Instead of doing this, we should just make the validate_and_retrieve function return a single value, not array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. refactpred validate_and_retrieve returns a single value, not an array
|
||
# HuggingFace Automatic Speech Recognition outputs should only ever be string | ||
# format so shouldn't get here, but just being safe | ||
return json.dumps(output_data, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your comment on the image_2_text one, maybe this should raise a ValueError instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated. Had previously copy pasted what you had for consistency.
3a8d37c
to
a0f6f12
Compare
Testplan,Same as in original pr description, output had the same output so omitting the screenshot |
a0f6f12
to
a6dfbdb
Compare
f"Attachment has no mime type. Specify the audio mimetype in the aiconfig" | ||
) | ||
|
||
if not attachment.mime_type.startswith("audio/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "audio" without trailing slash since we default to just "audio" and this would invalidate that. Alternatively, could default to "audio/*" but not sure if that will work the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see what you mean. Technically this doesn't break anything yet but will be needed.
Updated, thanks for catching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting to unblock, but please update the mimetype validation before landing
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now. Very similar to #1018 ## Testplan <img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb"> 1. Temporarily add model parser to Gradio Cookbook model parser registry. ``` asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference() AIConfigRuntime.register_model_parser( asr, asr.id() ) ``` 2. run AIConfig Edit on Gradio example `python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
a6dfbdb
to
02e43fd
Compare
update the mimetype validation to check |
[hf inference] ASR remote inference model parser impl
Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now.
Very similar to #1018
Testplan
python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers