Why can't multiple apis be triggered at the same time #873

zhengzhanpeng · 2023-11-05T12:00:47Z

This is an area that has been a headache for me, and I hope someone can answer me

tranhoangnguyen03 · 2023-11-08T20:16:01Z

Do you mean concurrent requests?

zpzheng · 2023-11-17T08:59:55Z

Do you mean concurrent requests?

yes

abetlen · 2023-11-23T06:19:15Z

@zhengzhanpeng this currently under development in #771

isaac-chung · 2024-04-08T07:33:07Z

@abetlen I noticed that llama.cpp's server supports concurrent requests and continuous batching as well https://github.com/ggerganov/llama.cpp/tree/master/examples/server. To enable that for this library, would it be as straightforward as exposing the relevant command line options? Or am I missing something obvious?

abetlen added the question Further information is requested label Nov 8, 2023

Provide feedback