feat: Add multi-LoRA support to OpenAI frontend #8038

kthui · 2025-02-26T18:58:10Z

What does the PR do?

Add support for selecting LoRA to OpenAI frontend. The user can optionally provide the --model-and-lora-name-separator=<lora_separator> command line argument when starting the OpenAI frontend. If the argument is provided and non-empty, the OpenAI model name maybe <triton_model_name><lora_separator><lora_name> if the <lora_separator> is found in the OpenAI model name. If found, the LoRA parameters will be set to the <lora_name> during inference.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

N/A

Where should the reviewer start?

Start with the test case, and then move on to the implementation.

Test plan:

A new test under the L0_openai is added. It will test for selecting LoRA with/without the --model-and-lora-name-separator argument set.

CI Pipeline ID: 24690349

Caveats:

The implementation currently only support vLLM, TRT-LLM will be added in a follow-up PR.

Background

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

* Use true model name for chat completion responses

* Refactor LoRA selection test * Add LoRA test to L0_openai * Refactor L0_openai test.sh * Set CUDA_VISIBLE_DEVICES * Do not fix output for different LoRAs * Update prompt

python/openai/openai_frontend/main.py

rmccorm4 · 2025-03-03T17:30:25Z

python/openai/tests_lora/server_wrapper.py

Why is this file copy/pasted? Can it use the existing one

server/python/openai/tests/utils.py

Line 64 in 205f13c

class OpenAIServer:

?

It was intended for keeping the new test separate from the existing tests, but it is possible to reuse the class OpenAIServer from the existing test: Reuse OpenAIServer from tests

Yeah let's try to re-use it if possible.

Does it need to be in a separate lora_tests folder? Can't you just have tests/test_lora.py?

This is interesting! The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py, which is incompatible with the LoRA tests. It turns out that the conftest.py will be ignored by pytest if the test_<something>.py does not import pytest, which is the case on the LoRA tests, so it is possible to keep the test_lora.py in the same tests directory.

Move LoRA tests to openai/tests

It turns out that the conftest.py will be ignored by pytest if the test_.py does not import pytest, which is the case on the LoRA tests, so it is possible to keep the test_lora.py in the same tests directory.

My understanding is that the module="scope" fixture should be called once per test module test_{file}.py that uses it. So given tests/test_existing.py and tests/test_lora.py, if both tests used that module scope fixture defined in conftest.py, it would be called twice, once per module.

Here's a simple example:

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat conftest.py import pytest @pytest.fixture(scope="module") def scope_module(): print("module scope fixture called!") rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod1.py def test_foo(scope_module): print("Test foo in module1!") rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod2.py def test_bar(scope_module): print("Test bar in module2!")

and

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ pytest -s ========================================================================== test session starts =========================================================================== platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.4.0 rootdir: /home/rmccormick/triton/distributed/jacky_pytest plugins: timeout-2.3.1, asyncio-0.23.8, anyio-3.7.1, cov-4.1.0, mock-3.14.0 asyncio: mode=strict collected 2 items test_mod1.py module scope fixture called! Test foo in module1! . test_mod2.py module scope fixture called! Test bar in module2! . =========================================================================== 2 passed in 0.01s ============================================================================

The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py, which is incompatible with the LoRA tests.

Following the details above, simply having the the lora tests in their own module, and not using the fixture anywhere in that module, should achieve the desired goal, as you've done here. It does not have to do with whether import pytest is used in this context, as far as I'm aware.

Does this all match your expectations?

Thanks for pointing out an example! I think a server variable has likely find its way into my initial testing when trying out the pytest, which caused the impression that the default server is always started for all test classes in the tests directory. This matches my expectation given each individual test case has control over what is started before the test.

rmccorm4 · 2025-03-03T17:36:19Z

python/openai/tests_lora/lora_test.py

+                    }
+                )
+            )
+        with open(f"models/{self._model_name}/1/multi_lora.json", "w") as f:


Can we attempt to read the multi_lora.json if found in OpenAI frontend to support listing available loras under the {model}{separator}{lora} scheme via /v1/models?

Yes, we can add support for listing model with LoRA name via v1/models.

Can we do it on a separate PR, since this is an added scope from supporting LoRA inference?

Separate PR works for me

krishung5 · 2025-03-04T00:51:14Z

python/openai/tests/test_lora.py

+            self._test_chat_completion(client, "doll")
+            self._test_chat_completion(client, "sheep")
+            # Check selecting unknown LoRA results in LoRA not found
+            # TODO: Server hangs when shutting down if LoRA not found.


Will this be included in follow-up PRs, or is it within the scope of this one?

It is unlikely to be in scope of either PR, because this is either an issue in the vLLM backend or how the OpenAI frontend / Python binding interact with Triton core, but the follow-up PR may early terminate the inference if LoRA can be determined and the specified LoRA is not a part of the loaded LoRAs, allowing the test to avoid the problematic part and pass.

Add LoRA selection to OpenAI frontend vLLM

1701a2e

* Use true model name for chat completion responses

kthui added the PR: feat A new feature label Feb 26, 2025

kthui self-assigned this Feb 26, 2025

kthui force-pushed the jacky-openai-lora-vllm branch from 4e8f68f to 6462593 Compare February 28, 2025 01:41

Add LoRA selection tests

a1717f2

* Refactor LoRA selection test * Add LoRA test to L0_openai * Refactor L0_openai test.sh * Set CUDA_VISIBLE_DEVICES * Do not fix output for different LoRAs * Update prompt

kthui force-pushed the jacky-openai-lora-vllm branch from 6462593 to a1717f2 Compare February 28, 2025 19:26

rmccorm4 requested review from rmccorm4, krishung5 and richardhuo-nv February 28, 2025 20:32

kthui marked this pull request as ready for review February 28, 2025 20:34

rmccorm4 reviewed Mar 3, 2025

View reviewed changes

python/openai/openai_frontend/main.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Mar 3, 2025

View reviewed changes

kthui added 3 commits March 3, 2025 10:52

Rename --model-and-lora-name-separator to --lora-separator

09a0868

Reuse OpenAIServer from tests

b7897f6

Delete tests_lora/server_wrapper.py

49ef664

kthui requested a review from rmccorm4 March 3, 2025 20:42

kthui added 2 commits March 3, 2025 16:26

Move LoRA tests to openai/tests

31e889b

Simplify OpenAIServer import

5a70869

krishung5 reviewed Mar 4, 2025

View reviewed changes

kthui mentioned this pull request Mar 4, 2025

feat: Add OpenAI frontend multi-LoRA model listing #8052

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add multi-LoRA support to OpenAI frontend #8038

feat: Add multi-LoRA support to OpenAI frontend #8038

kthui commented Feb 26, 2025 •

edited

Loading

rmccorm4 Mar 3, 2025

kthui Mar 3, 2025

rmccorm4 Mar 3, 2025

kthui Mar 4, 2025 •

edited

Loading

rmccorm4 Mar 4, 2025 •

edited

Loading

kthui Mar 4, 2025

rmccorm4 Mar 3, 2025

kthui Mar 3, 2025

rmccorm4 Mar 3, 2025

krishung5 Mar 4, 2025

kthui Mar 4, 2025

feat: Add multi-LoRA support to OpenAI frontend #8038

Are you sure you want to change the base?

feat: Add multi-LoRA support to OpenAI frontend #8038

Conversation

kthui commented Feb 26, 2025 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kthui Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kthui commented Feb 26, 2025 •

edited

Loading

kthui Mar 4, 2025 •

edited

Loading

rmccorm4 Mar 4, 2025 •

edited

Loading