Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async HTTP Python Client not working properly #6887

Closed
mutkach opened this issue Feb 15, 2024 · 2 comments · Fixed by #6975
Closed

Async HTTP Python Client not working properly #6887

mutkach opened this issue Feb 15, 2024 · 2 comments · Fixed by #6975

Comments

@mutkach
Copy link

mutkach commented Feb 15, 2024

Description

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 5: invalid start byte when requesting a model config using an async http client.

When observing the http response directly using tritonclient.http.aio without wrapper, I've noticed that async client does not decompress the http response by itself, so it seems to be fixable with setting auto_decompress=True on aiohttp.ClientSession. I believe in my case the compression is imposed elsewhere (by nginx?).

If that was intended (as the auto_decompress=True is the default setting) then some additional logic is required to process the compressed response (calling to brotli.decompress() fixed it in my case).
In any case I'll be happy to provide the necessary fixes.

Triton Information

  • Triton server version 2.38.0
  • Triton container 23.09
  • tritonclient==2.33.0, 2.42.0
  • nvidia-pytriton==0.2.5, 0.3.0 and 0.5.1 when built from source

To Reproduce

import asyncio
import numpy
from pytriton.client import AsyncioModelClient, ModelClient
from pytriton.client.utils import get_model_config

HOST = "HOST:80"
model_name = "intent_classifier" #does not work
message = "Simple and correct query for testing"

async def run_classification(inferer_server_endpoint: str, clf_model_name: str, message: str, **_):
    async with AsyncioModelClient(inferer_server_endpoint, clf_model_name) as client:
        inference_client = client.create_client_from_url(inferer_server_endpoint)
        config = await inference_client.get_model_config(model_name)
        print(config)

def sync_classification(inferer_server_endpoint: str, clf_model_name: str, message: str, **_):
    with ModelClient(inferer_server_endpoint, clf_model_name) as client:
        inference_client = client.create_client_from_url(inferer_server_endpoint)
        config = inference_client.get_model_config(model_name)
        print(config)

if __name__ == "__main__":
    sync_classification(HOST, model_name, message) # Works
    #asyncio.run(run_classification(HOST, model_name, message)) # Does not

config is attached
classifier_config.txt

Expected behavior
Model configuration should be returned in a valid json format to be parsed into python dict

@kthui
Copy link
Contributor

kthui commented Feb 15, 2024

Thanks for reporting the issue, I have filed a ticket for us to investigate further.

In any case I'll be happy to provide the necessary fixes.

Any contribution is welcomed!

@kthui
Copy link
Contributor

kthui commented Mar 9, 2024

Hi @mutkach, I took a deeper look into the Python AsyncIO client and seems like we already have decompression built in. When calling the async infer(), it will:

  1. read the Content-Encoding header from the response header
  2. pass the Content-Encoding header to InferResult class that reads the response body
  3. the InferResult class will auto-decompress the response body based on the Content-Encoding header set by the server.

I believe in my case the compression is imposed elsewhere (by nginx?).

Would you be able to share the response headers received, when encountering this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants