Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add multi-LoRA support to OpenAI frontend #8038

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Feb 26, 2025

What does the PR do?

Add support for selecting LoRA to OpenAI frontend. The user can optionally provide the --model-and-lora-name-separator=<lora_separator> command line argument when starting the OpenAI frontend. If the argument is provided and non-empty, the OpenAI model name maybe <triton_model_name><lora_separator><lora_name> if the <lora_separator> is found in the OpenAI model name. If found, the LoRA parameters will be set to the <lora_name> during inference.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

N/A

Where should the reviewer start?

Start with the test case, and then move on to the implementation.

Test plan:

A new test under the L0_openai is added. It will test for selecting LoRA with/without the --model-and-lora-name-separator argument set.

  • CI Pipeline ID: 24690349

Caveats:

The implementation currently only support vLLM, TRT-LLM will be added in a follow-up PR.

Background

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

* Use true model name for chat completion responses
@kthui kthui added the PR: feat A new feature label Feb 26, 2025
@kthui kthui self-assigned this Feb 26, 2025
@kthui kthui force-pushed the jacky-openai-lora-vllm branch from 4e8f68f to 6462593 Compare February 28, 2025 01:41
* Refactor LoRA selection test

* Add LoRA test to L0_openai

* Refactor L0_openai test.sh

* Set CUDA_VISIBLE_DEVICES

* Do not fix output for different LoRAs

* Update prompt
@kthui kthui force-pushed the jacky-openai-lora-vllm branch from 6462593 to a1717f2 Compare February 28, 2025 19:26
@kthui kthui marked this pull request as ready for review February 28, 2025 20:34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file copy/pasted? Can it use the existing one

class OpenAIServer:
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intended for keeping the new test separate from the existing tests, but it is possible to reuse the class OpenAIServer from the existing test: Reuse OpenAIServer from tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's try to re-use it if possible.

Does it need to be in a separate lora_tests folder? Can't you just have tests/test_lora.py?

Copy link
Contributor Author

@kthui kthui Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting! The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py, which is incompatible with the LoRA tests. It turns out that the conftest.py will be ignored by pytest if the test_<something>.py does not import pytest, which is the case on the LoRA tests, so it is possible to keep the test_lora.py in the same tests directory.

Move LoRA tests to openai/tests

Copy link
Contributor

@rmccorm4 rmccorm4 Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the conftest.py will be ignored by pytest if the test_.py does not import pytest, which is the case on the LoRA tests, so it is possible to keep the test_lora.py in the same tests directory.

My understanding is that the module="scope" fixture should be called once per test module test_{file}.py that uses it. So given tests/test_existing.py and tests/test_lora.py, if both tests used that module scope fixture defined in conftest.py, it would be called twice, once per module.

Here's a simple example:

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat conftest.py 
import pytest

@pytest.fixture(scope="module")
def scope_module():
    print("module scope fixture called!")

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod1.py 
def test_foo(scope_module):
    print("Test foo in module1!")

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod2.py 
def test_bar(scope_module):
    print("Test bar in module2!")

and

rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ pytest -s
========================================================================== test session starts ===========================================================================
platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/rmccormick/triton/distributed/jacky_pytest
plugins: timeout-2.3.1, asyncio-0.23.8, anyio-3.7.1, cov-4.1.0, mock-3.14.0
asyncio: mode=strict
collected 2 items                                                                                                                                                        

test_mod1.py module scope fixture called!
Test foo in module1!
.
test_mod2.py module scope fixture called!
Test bar in module2!
.

=========================================================================== 2 passed in 0.01s ============================================================================

The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py, which is incompatible with the LoRA tests.

Following the details above, simply having the the lora tests in their own module, and not using the fixture anywhere in that module, should achieve the desired goal, as you've done here. It does not have to do with whether import pytest is used in this context, as far as I'm aware.

Does this all match your expectations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out an example! I think a server variable has likely find its way into my initial testing when trying out the pytest, which caused the impression that the default server is always started for all test classes in the tests directory. This matches my expectation given each individual test case has control over what is started before the test.

}
)
)
with open(f"models/{self._model_name}/1/multi_lora.json", "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we attempt to read the multi_lora.json if found in OpenAI frontend to support listing available loras under the {model}{separator}{lora} scheme via /v1/models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can add support for listing model with LoRA name via v1/models.

Can we do it on a separate PR, since this is an added scope from supporting LoRA inference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR works for me

@kthui kthui requested a review from rmccorm4 March 3, 2025 20:42
self._test_chat_completion(client, "doll")
self._test_chat_completion(client, "sheep")
# Check selecting unknown LoRA results in LoRA not found
# TODO: Server hangs when shutting down if LoRA not found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be included in follow-up PRs, or is it within the scope of this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlikely to be in scope of either PR, because this is either an issue in the vLLM backend or how the OpenAI frontend / Python binding interact with Triton core, but the follow-up PR may early terminate the inference if LoRA can be determined and the specified LoRA is not a part of the loaded LoRAs, allowing the test to avoid the problematic part and pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

3 participants