-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add multi-LoRA support to OpenAI frontend #8038
base: main
Are you sure you want to change the base?
Conversation
* Use true model name for chat completion responses
4e8f68f
to
6462593
Compare
* Refactor LoRA selection test * Add LoRA test to L0_openai * Refactor L0_openai test.sh * Set CUDA_VISIBLE_DEVICES * Do not fix output for different LoRAs * Update prompt
6462593
to
a1717f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this file copy/pasted? Can it use the existing one
server/python/openai/tests/utils.py
Line 64 in 205f13c
class OpenAIServer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intended for keeping the new test separate from the existing tests, but it is possible to reuse the class OpenAIServer
from the existing test: Reuse OpenAIServer from tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's try to re-use it if possible.
Does it need to be in a separate lora_tests
folder? Can't you just have tests/test_lora.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting! The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py
, which is incompatible with the LoRA tests. It turns out that the conftest.py
will be ignored by pytest if the test_<something>.py
does not import pytest
, which is the case on the LoRA tests, so it is possible to keep the test_lora.py
in the same tests
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the conftest.py will be ignored by pytest if the test_.py does not import pytest, which is the case on the LoRA tests, so it is possible to keep the test_lora.py in the same tests directory.
My understanding is that the module="scope"
fixture should be called once per test module test_{file}.py
that uses it. So given tests/test_existing.py
and tests/test_lora.py
, if both tests used that module scope fixture defined in conftest.py, it would be called twice, once per module.
Here's a simple example:
rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat conftest.py
import pytest
@pytest.fixture(scope="module")
def scope_module():
print("module scope fixture called!")
rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod1.py
def test_foo(scope_module):
print("Test foo in module1!")
rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ cat test_mod2.py
def test_bar(scope_module):
print("Test bar in module2!")
and
rmccormick@ced35d0-lcedt:~/triton/distributed/jacky_pytest$ pytest -s
========================================================================== test session starts ===========================================================================
platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/rmccormick/triton/distributed/jacky_pytest
plugins: timeout-2.3.1, asyncio-0.23.8, anyio-3.7.1, cov-4.1.0, mock-3.14.0
asyncio: mode=strict
collected 2 items
test_mod1.py module scope fixture called!
Test foo in module1!
.
test_mod2.py module scope fixture called!
Test bar in module2!
.
=========================================================================== 2 passed in 0.01s ============================================================================
The original goal of separating the tests was to avoid the LoRA tests from starting the module scoped OpenAI server on the conftest.py, which is incompatible with the LoRA tests.
Following the details above, simply having the the lora tests in their own module, and not using the fixture anywhere in that module, should achieve the desired goal, as you've done here. It does not have to do with whether import pytest
is used in this context, as far as I'm aware.
Does this all match your expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out an example! I think a server
variable has likely find its way into my initial testing when trying out the pytest, which caused the impression that the default server is always started for all test classes in the tests directory. This matches my expectation given each individual test case has control over what is started before the test.
} | ||
) | ||
) | ||
with open(f"models/{self._model_name}/1/multi_lora.json", "w") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we attempt to read the multi_lora.json
if found in OpenAI frontend to support listing available loras under the {model}{separator}{lora}
scheme via /v1/models
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add support for listing model with LoRA name via v1/models
.
Can we do it on a separate PR, since this is an added scope from supporting LoRA inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR works for me
self._test_chat_completion(client, "doll") | ||
self._test_chat_completion(client, "sheep") | ||
# Check selecting unknown LoRA results in LoRA not found | ||
# TODO: Server hangs when shutting down if LoRA not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be included in follow-up PRs, or is it within the scope of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unlikely to be in scope of either PR, because this is either an issue in the vLLM backend or how the OpenAI frontend / Python binding interact with Triton core, but the follow-up PR may early terminate the inference if LoRA can be determined and the specified LoRA is not a part of the loaded LoRAs, allowing the test to avoid the problematic part and pass.
What does the PR do?
Add support for selecting LoRA to OpenAI frontend. The user can optionally provide the
--model-and-lora-name-separator=<lora_separator>
command line argument when starting the OpenAI frontend. If the argument is provided and non-empty, the OpenAI model name maybe<triton_model_name><lora_separator><lora_name>
if the<lora_separator>
is found in the OpenAI model name. If found, the LoRA parameters will be set to the<lora_name>
during inference.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
N/A
Where should the reviewer start?
Start with the test case, and then move on to the implementation.
Test plan:
A new test under the L0_openai is added. It will test for selecting LoRA with/without the
--model-and-lora-name-separator
argument set.Caveats:
The implementation currently only support vLLM, TRT-LLM will be added in a follow-up PR.
Background
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A