Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add OpenAI frontend multi-LoRA model listing #8052

Open
wants to merge 13 commits into
base: jacky-openai-lora-vllm
Choose a base branch
from

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Mar 4, 2025

What does the PR do?

Add support for listing/retrieving vLLM models that has LoRA(s) specified at the backend to OpenAI frontend.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Merging into #8038

Where should the reviewer start?

Start with the new README section about the feature, and then the related PR, and finally the change on this PR and tests.

Test plan:

The L0_openai_vllm test_lora.py is enhanced with new tests that list and retrieve models via OpenAI endpoint, and assert the listed and retrieved models are correct given the LoRA configuration.

  • CI Pipeline ID: 25018672

Caveats:

N/A

Background

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

@kthui kthui added the PR: feat A new feature label Mar 4, 2025
@kthui kthui self-assigned this Mar 4, 2025
ziqif-nv and others added 4 commits March 4, 2025 14:58
* Support listing models with LoRAs

* Enable LoRA infer to fail if LoRA is unknown to the backend given the info is successfully retrieved

* Add LoRA model listing and retrieving tests

* Enable unknown LoRA infer tests

* Distinguish between empty LoRA and cannot determine LoRA on the frontend

* Add tests for separator with LoRA off model

* Add tests models which its LoRAs cannot be determined
@kthui kthui force-pushed the jacky-openai-lora-vllm-model-list branch from 7158843 to 7a8893c Compare March 5, 2025 19:34
@kthui kthui marked this pull request as ready for review March 5, 2025 22:51
@@ -230,6 +230,32 @@ pip install -r requirements-test.txt
pytest -v tests/
```

### LoRA Adapters

If the command line argument `--lora-separator=<separator_string>` is provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - for easy UX purposes - why not provide a default lora separator and just load them if detected (ex: user prepared the multi_lora.json)? Seems tedious to make the user specify a separator to get lora support, but not sure what other solutions do.

Copy link
Contributor Author

@kthui kthui Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main benefit is saving existing users from having to add --lora-separator to their start command, providing they already have working models with LoRA adapters over KServe, after upgrading their frontend version.

I have two concerns with a default separator:

  1. We don't have control over what model/LoRA names are used. For example, a user trying out LoRA separator may name their model experiment_model_with_lora_adapters or LoRA adapter gemmadoll_2b_dolly_lora_tune (the adapter HF repo name used in testing in underscore/lowercase). If the default separator is named _lora_, then it will force the user to either rename their model/LoRA or set the --lora-separator to a different string.
  2. If the LoRA name list is not detected (i.e. the model is from cloud storage), users should still be able to inference using well-formed model name and LoRA pairs, so we cannot turn off the default LoRA separator if the multi_lora.json is not detected. This complicates things for users not using LoRA, i.e. any model name containing the default LoRA separator will either require the user to set the --lora-separator flag or change their model name.

Given OpenAI endpoint API does not officially support LoRA, I think we should always let the user opt-in to a feature that is not 100% compatible with any well-formed request under the official standard (i.e. by explicitly setting the --lora-separator=<str>), instead of having them opt-out if they later find out it does not work for them.

Maybe there is a separator that will never conflict with any model/LoRA name that I am missing?

cc @nnshah1 @krishung5 @richardhuo-nv if you have any thought?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing more details - let's keep it opt-in for now then, and if we find it can be improved or made default later, we can do so later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed ticket DLIS-8193

nv-kmcgill53 and others added 2 commits March 5, 2025 20:33
Co-authored-by: Misha Chornyi <[email protected]>
Co-authored-by: Olga Andreeva <[email protected]>
Comment on lines +250 to +253
When listing or retrieving model(s), the model id will include the LoRA name in
the same `<model_name><separator_string><lora_name>` format for each LoRA
adapter listed on the `multi_lora.json`. Note: The LoRA name inclusion is
limited to locally stored models, inference requests are not limited though.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to show an example output of calling /v1/models with a lora for users to get a quick understanding of how it would look and be used.

Copy link
Contributor Author

@kthui kthui Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@rmccorm4 rmccorm4 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When listing or retrieving model(s), the model id will include the LoRA name in the same <model_name><separator_string><lora_name> format for each LoRA adapter listed on the multi_lora.json. Note: The LoRA name inclusion is limited to locally stored models, inference requests are not limited though.

Can you add collapsed example output of curl .../v1/models showing the lora models in the output right below this block?

@kthui kthui requested a review from rmccorm4 March 7, 2025 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

7 participants