Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: max_completion_tokens being ignored #701

Open
njbrake opened this issue Feb 27, 2025 · 0 comments
Open

Bug: max_completion_tokens being ignored #701

njbrake opened this issue Feb 27, 2025 · 0 comments

Comments

@njbrake
Copy link

njbrake commented Feb 27, 2025

Contact Details

[email protected]

What happened?

I'm using LiteLLM to inference Llamafile in Lumigator (https://github.com/mozilla-ai/lumigator/blob/main/lumigator/jobs/inference/model_clients.py#L65) and Llamafile appears to be ignoring max_completion_tokens and only listens to max_tokens?

If I pass in max_completion_tokens=512, Looking at the llamafile verbose logs I see

 process_token] next token | has_next_token=true n_remain=-1 

But if I pass in max_tokens=512, then I see what I expect, i.e.

 process_token] next token | has_next_token=true n_remain=511

Not sure why one works but not the other, since I see

params->max_tokens = max_completion_tokens.getNumber();

which makes it look like max_completion_tokens is the same as max_tokens.

When I look at the LiteLLM logs I see this when I pass in the max_tokens param

http://localhost:8080/v1/ \.....
-d '{'model': 'DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf', ... 'temperature': 0.0, 'top_p': 0.9, 'max_tokens': 512, 'frequency_penalty': 0.0, 'extra_body': {}}'

When i pass in the max_completions_token, this is what LiteLLM sends:

http://localhost:8080/v1/ \.....
 -d '{'model': 'DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf','temperature': 0.0, 'top_p': 0.9, 'max_completion_tokens': 512, 'frequency_penalty': 0.0, 'extra_body': {}}'

So, best I can tell, LiteLLM is doing the right thing and passing the params along.

`

Version

Model creator: unsloth
Quantized GGUF files used: unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF
Commit message "Update README.md"
Commit hash 097680e4eed7a83b3df6b0bb5e5134099cadf1b0
LlamaFile version used: Mozilla-Ocho/llamafile
Commit message "Merge pull request #687 from Xydane/main Add Support for DeepSeek-R1 models"
Commit hash 29b5f27

What operating system are you seeing the problem on?

Linux, Mac

Relevant log output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant