You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After spending a morning, I found the most smoothbrain workaround imaginable.
I had to discover that requests' iter_lines actually has a chunk_size parameter with a default of 512. Setting it to something like 10 solved the problem.
However, if i run that script on a llama.cpp server /completion i get smooth streaming without needing to change chunk_size, that's why initially i thought it was a koboldcpp issue. I don't know why that is tho.
I'm calling the api like this:
The problem is that the stream isn't smooth, the tokens arrive in chunks roughly every half-second.
version: 1.85.1
windows 10
The text was updated successfully, but these errors were encountered: