feat: Add OpenAI frontend multi-LoRA model listing #8052

kthui · 2025-03-04T21:47:20Z

What does the PR do?

Add support for listing/retrieving vLLM models that has LoRA(s) specified at the backend to OpenAI frontend.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Merging into #8038

Where should the reviewer start?

Start with the new README section about the feature, and then the related PR, and finally the change on this PR and tests.

Test plan:

The L0_openai_vllm test_lora.py is enhanced with new tests that list and retrieve models via OpenAI endpoint, and assert the listed and retrieved models are correct given the LoRA configuration.

CI Pipeline ID: 25018672

Caveats:

N/A

Background

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

python/openai/openai_frontend/engine/utils/triton.py

…erver (#8053)

… PA (#8054)

* Support listing models with LoRAs * Enable LoRA infer to fail if LoRA is unknown to the backend given the info is successfully retrieved * Add LoRA model listing and retrieving tests * Enable unknown LoRA infer tests * Distinguish between empty LoRA and cannot determine LoRA on the frontend * Add tests for separator with LoRA off model * Add tests models which its LoRAs cannot be determined

python/openai/tests/test_lora.py

rmccorm4 · 2025-03-06T04:28:48Z

python/openai/README.md

@@ -230,6 +230,32 @@ pip install -r requirements-test.txt
 pytest -v tests/
 ```

+### LoRA Adapters
+
+If the command line argument `--lora-separator=<separator_string>` is provided


Quick question - for easy UX purposes - why not provide a default lora separator and just load them if detected (ex: user prepared the multi_lora.json)? Seems tedious to make the user specify a separator to get lora support, but not sure what other solutions do.

I think the main benefit is saving existing users from having to add --lora-separator to their start command, providing they already have working models with LoRA adapters over KServe, after upgrading their frontend version.

I have two concerns with a default separator:

We don't have control over what model/LoRA names are used. For example, a user trying out LoRA separator may name their model experiment_model_with_lora_adapters or LoRA adapter gemmadoll_2b_dolly_lora_tune (the adapter HF repo name used in testing in underscore/lowercase). If the default separator is named _lora_, then it will force the user to either rename their model/LoRA or set the --lora-separator to a different string.

If the LoRA name list is not detected (i.e. the model is from cloud storage), users should still be able to inference using well-formed model name and LoRA pairs, so we cannot turn off the default LoRA separator if the multi_lora.json is not detected. This complicates things for users not using LoRA, i.e. any model name containing the default LoRA separator will either require the user to set the --lora-separator flag or change their model name.

Given OpenAI endpoint API does not officially support LoRA, I think we should always let the user opt-in to a feature that is not 100% compatible with any well-formed request under the official standard (i.e. by explicitly setting the --lora-separator=<str>), instead of having them opt-out if they later find out it does not work for them.

Maybe there is a separator that will never conflict with any model/LoRA name that I am missing?

cc @nnshah1 @krishung5 @richardhuo-nv if you have any thought?

Thanks for sharing more details - let's keep it opt-in for now then, and if we find it can be improved or made default later, we can do so later.

Filed ticket DLIS-8193

Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: Olga Andreeva <[email protected]>

… jacky-openai-lora-vllm-model-list

rmccorm4 · 2025-03-06T18:24:45Z

python/openai/README.md

+When listing or retrieving model(s), the model id will include the LoRA name in
+the same `<model_name><separator_string><lora_name>` format for each LoRA
+adapter listed on the `multi_lora.json`. Note: The LoRA name inclusion is
+limited to locally stored models, inference requests are not limited though.


I think it would be nice to show an example output of calling /v1/models with a lora for users to get a quick understanding of how it would look and be used.

Sure, enhanced the example:
docs: Enhance example showing different outputs with different LoRAs
docs: Collapse OpenAI LoRA example

When listing or retrieving model(s), the model id will include the LoRA name in the same <model_name><separator_string><lora_name> format for each LoRA adapter listed on the multi_lora.json. Note: The LoRA name inclusion is limited to locally stored models, inference requests are not limited though.

Can you add collapsed example output of curl .../v1/models showing the lora models in the output right below this block?

BenjaminBraunDev and others added 2 commits February 27, 2025 09:10

fix: build-secret flag not being set breaking build.py (#7993)

205f13c

feat: add generate support to sagemaker_server (#8047)

96e7cb5

kthui added the PR: feat A new feature label Mar 4, 2025

kthui self-assigned this Mar 4, 2025

github-advanced-security bot found potential problems Mar 4, 2025

View reviewed changes

python/openai/openai_frontend/engine/utils/triton.py Fixed Show fixed Hide fixed

python/openai/openai_frontend/engine/utils/triton.py Fixed Show fixed Hide fixed

ziqif-nv and others added 4 commits March 4, 2025 14:58

doc: add a user guide of important pointers on how to run sagemaker_s…

889d2b9

…erver (#8053)

Reverse order of GAP and PA installation to avoid installing outdated…

d49db95

… PA (#8054)

Enable remote cache if applicable (#8042)

b1e19b4

kthui force-pushed the jacky-openai-lora-vllm-model-list branch from 7158843 to 7a8893c Compare March 5, 2025 19:34

docs: Add LoRA adapter selection to README

8757dd4

kthui marked this pull request as ready for review March 5, 2025 22:51

kthui requested review from rmccorm4, krishung5 and richardhuo-nv March 5, 2025 22:51

rmccorm4 reviewed Mar 6, 2025

View reviewed changes

python/openai/tests/test_lora.py Show resolved Hide resolved

rmccorm4 reviewed Mar 6, 2025

View reviewed changes

nv-kmcgill53 and others added 2 commits March 5, 2025 20:33

Advance upstreams to 25.02 (#8050)

ababbeb

Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: Olga Andreeva <[email protected]>

Merge branch 'main' of github.com:triton-inference-server/server into…

15bb946

… jacky-openai-lora-vllm-model-list

rmccorm4 approved these changes Mar 6, 2025

View reviewed changes

rmccorm4 reviewed Mar 6, 2025

View reviewed changes

kthui added 2 commits March 6, 2025 17:15

docs: Enhance example showing different outputs with different LoRAs

8492018

docs: Collapse OpenAI LoRA example

fe8047e

kthui requested a review from rmccorm4 March 7, 2025 01:21

kthui added 2 commits March 7, 2025 09:10

test: Increase server start time limit

7f6e05e

test: Limit LoRA tests to 1 GPU visible

df7aa79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add OpenAI frontend multi-LoRA model listing #8052

feat: Add OpenAI frontend multi-LoRA model listing #8052

kthui commented Mar 4, 2025 •

edited

Loading

rmccorm4 Mar 6, 2025

kthui Mar 6, 2025 •

edited

Loading

rmccorm4 Mar 6, 2025

kthui Mar 6, 2025

rmccorm4 Mar 6, 2025

kthui Mar 7, 2025 •

edited

Loading

rmccorm4 Mar 7, 2025 •

edited

Loading

feat: Add OpenAI frontend multi-LoRA model listing #8052

Are you sure you want to change the base?

feat: Add OpenAI frontend multi-LoRA model listing #8052

Conversation

kthui commented Mar 4, 2025 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

rmccorm4 Mar 6, 2025

Choose a reason for hiding this comment

kthui Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Mar 6, 2025

Choose a reason for hiding this comment

kthui Mar 6, 2025

Choose a reason for hiding this comment

rmccorm4 Mar 6, 2025

Choose a reason for hiding this comment

kthui Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

kthui commented Mar 4, 2025 •

edited

Loading

kthui Mar 6, 2025 •

edited

Loading

kthui Mar 7, 2025 •

edited

Loading

rmccorm4 Mar 7, 2025 •

edited

Loading