Bug: max_completion_tokens being ignored #701

njbrake · 2025-02-27T16:57:58Z

Contact Details

What happened?

I'm using LiteLLM to inference Llamafile in Lumigator (https://github.com/mozilla-ai/lumigator/blob/main/lumigator/jobs/inference/model_clients.py#L65) and Llamafile appears to be ignoring max_completion_tokens and only listens to max_tokens?

If I pass in max_completion_tokens=512, Looking at the llamafile verbose logs I see

 process_token] next token | has_next_token=true n_remain=-1

But if I pass in max_tokens=512, then I see what I expect, i.e.

 process_token] next token | has_next_token=true n_remain=511

Not sure why one works but not the other, since I see

llamafile/llamafile/server/v1_completions.cpp

Line 267 in 29b5f27

params->max_tokens = max_completion_tokens.getNumber();

which makes it look like max_completion_tokens is the same as max_tokens.

When I look at the LiteLLM logs I see this when I pass in the max_tokens param

http://localhost:8080/v1/ \.....
-d '{'model': 'DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf', ... 'temperature': 0.0, 'top_p': 0.9, 'max_tokens': 512, 'frequency_penalty': 0.0, 'extra_body': {}}'

When i pass in the max_completions_token, this is what LiteLLM sends:

http://localhost:8080/v1/ \.....
 -d '{'model': 'DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf','temperature': 0.0, 'top_p': 0.9, 'max_completion_tokens': 512, 'frequency_penalty': 0.0, 'extra_body': {}}'

So, best I can tell, LiteLLM is doing the right thing and passing the params along.

`

Version

Model creator: unsloth
Quantized GGUF files used: unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF
Commit message "Update README.md"
Commit hash 097680e4eed7a83b3df6b0bb5e5134099cadf1b0
LlamaFile version used: Mozilla-Ocho/llamafile
Commit message "Merge pull request #687 from Xydane/main Add Support for DeepSeek-R1 models"
Commit hash 29b5f27

What operating system are you seeing the problem on?

Linux, Mac

Relevant log output

The text was updated successfully, but these errors were encountered:

njbrake added bug medium severity labels Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: max_completion_tokens being ignored #701

Bug: max_completion_tokens being ignored #701

njbrake commented Feb 27, 2025

Bug: max_completion_tokens being ignored #701

Bug: max_completion_tokens being ignored #701

Comments

njbrake commented Feb 27, 2025

Contact Details

What happened?

Version

What operating system are you seeing the problem on?

Relevant log output