-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add OpenAI frontend multi-LoRA model listing #8052
base: jacky-openai-lora-vllm
Are you sure you want to change the base?
Conversation
* Support listing models with LoRAs * Enable LoRA infer to fail if LoRA is unknown to the backend given the info is successfully retrieved * Add LoRA model listing and retrieving tests * Enable unknown LoRA infer tests * Distinguish between empty LoRA and cannot determine LoRA on the frontend * Add tests for separator with LoRA off model * Add tests models which its LoRAs cannot be determined
7158843
to
7a8893c
Compare
@@ -230,6 +230,32 @@ pip install -r requirements-test.txt | |||
pytest -v tests/ | |||
``` | |||
|
|||
### LoRA Adapters | |||
|
|||
If the command line argument `--lora-separator=<separator_string>` is provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question - for easy UX purposes - why not provide a default lora separator and just load them if detected (ex: user prepared the multi_lora.json)? Seems tedious to make the user specify a separator to get lora support, but not sure what other solutions do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main benefit is saving existing users from having to add --lora-separator
to their start command, providing they already have working models with LoRA adapters over KServe, after upgrading their frontend version.
I have two concerns with a default separator:
- We don't have control over what model/LoRA names are used. For example, a user trying out LoRA separator may name their model
experiment_model_with_lora_adapters
or LoRA adaptergemmadoll_2b_dolly_lora_tune
(the adapter HF repo name used in testing in underscore/lowercase). If the default separator is named_lora_
, then it will force the user to either rename their model/LoRA or set the--lora-separator
to a different string. - If the LoRA name list is not detected (i.e. the model is from cloud storage), users should still be able to inference using well-formed model name and LoRA pairs, so we cannot turn off the default LoRA separator if the multi_lora.json is not detected. This complicates things for users not using LoRA, i.e. any model name containing the default LoRA separator will either require the user to set the
--lora-separator
flag or change their model name.
Given OpenAI endpoint API does not officially support LoRA, I think we should always let the user opt-in to a feature that is not 100% compatible with any well-formed request under the official standard (i.e. by explicitly setting the --lora-separator=<str>
), instead of having them opt-out if they later find out it does not work for them.
Maybe there is a separator that will never conflict with any model/LoRA name that I am missing?
cc @nnshah1 @krishung5 @richardhuo-nv if you have any thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing more details - let's keep it opt-in for now then, and if we find it can be improved or made default later, we can do so later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed ticket DLIS-8193
Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: Olga Andreeva <[email protected]>
… jacky-openai-lora-vllm-model-list
When listing or retrieving model(s), the model id will include the LoRA name in | ||
the same `<model_name><separator_string><lora_name>` format for each LoRA | ||
adapter listed on the `multi_lora.json`. Note: The LoRA name inclusion is | ||
limited to locally stored models, inference requests are not limited though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to show an example output of calling /v1/models
with a lora for users to get a quick understanding of how it would look and be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, enhanced the example:
docs: Enhance example showing different outputs with different LoRAs
docs: Collapse OpenAI LoRA example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When listing or retrieving model(s), the model id will include the LoRA name in the same <model_name><separator_string><lora_name> format for each LoRA adapter listed on the multi_lora.json. Note: The LoRA name inclusion is limited to locally stored models, inference requests are not limited though.
Can you add collapsed example output of curl .../v1/models
showing the lora models in the output right below this block?
What does the PR do?
Add support for listing/retrieving vLLM models that has LoRA(s) specified at the backend to OpenAI frontend.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Merging into #8038
Where should the reviewer start?
Start with the new README section about the feature, and then the related PR, and finally the change on this PR and tests.
Test plan:
The L0_openai_vllm test_lora.py is enhanced with new tests that list and retrieve models via OpenAI endpoint, and assert the listed and retrieved models are correct given the LoRA configuration.
Caveats:
N/A
Background
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A