Max completion tokens #720

joshuacoles · 2025-02-20T19:17:52Z

This PR builds on the changes from #716 adding the max_completion_tokens property to the ChatGPT API's ChatCompletionRequest in alignment with the OpenAI API spec. This is used to provide a client defined limit on the maximum number of tokens returned from inference, as long as this is less than the node.max_generate_tokens specified on the command line. This supports both the new max_completion_tokens and the older max_tokens option if the former is not present.

The main structural change in this PR is the addition of the GenerationOptions parameter to the SendPrompt and SendTensor grpc methods and the methods which they call. This allows for request specific options during inference. I kept this separate from the existing InferenceState as this seemed focused on stable diffusion specific data which could change over the course of inference.

If any more changes are made to the parent PR I will rebase this PR to match.

structure

…on type

…ocess_inference_result`

…o more closely align with OpenAI

joshuacoles · 2025-02-20T19:22:55Z

Also @AlexCheema whilst I was adding this I noticed that the SendPrompt and SendTensor both return a Tensor as per the grpc definitions but the corresponding methods on Node (Node#process_prompt and Node#process_tensor) do not return anything, so this tensor is always empty.

I see this changed from Tensor -> Empty in c9ded9b then back in 9954ce8. I have left this signature unchanged but wanted to check which is the expected behaviour?

joshuacoles · 2025-03-07T15:58:47Z

Closed in favour of the combined PR #734

joshuacoles and others added 8 commits February 19, 2025 15:47

Strip EOS token from output to mirror the OpenAI behaviour

0560640

Add "[DONE]" message and change streaming response to mirror OpenAI

40e8619

structure

Fix bench.py to support "[DONE]" terminating event in stream

fed063c

Resolve out of range error in debug line

3daa77d

Add generation options protocol buf definition and corresponding pyth…

510d1b2

…on type

Fish generation_options through from the ChatGPT API request to `pr…

afaf6e6

…ocess_inference_result`

Apply the generation options to inference

5855230

Emit the finish reason completion chunk separately from the content t…

7213ccb

…o more closely align with OpenAI

joshuacoles marked this pull request as ready for review February 21, 2025 09:46

This was referenced Feb 25, 2025

Consolidated ChatGPT API improvements: Improve Compatibility, add requests specific token limits, and textual stop sequences #734

Open

Support structured responses from the ChatGPT API #735

Closed

joshuacoles closed this Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max completion tokens #720

Max completion tokens #720

joshuacoles commented Feb 20, 2025

joshuacoles commented Feb 20, 2025

joshuacoles commented Mar 7, 2025

Max completion tokens #720

Max completion tokens #720

Conversation

joshuacoles commented Feb 20, 2025

joshuacoles commented Feb 20, 2025

joshuacoles commented Mar 7, 2025