-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidated ChatGPT API improvements: Improve Compatibility, add requests specific token limits, and textual stop sequences #734
base: main
Are you sure you want to change the base?
Conversation
…ocess_inference_result`
…o more closely align with OpenAI
…tion from the ChatGPT API
…und for textual stop sequence matches
This removes direct references the internals of BufferedOutput to allow for better abstraction
1e54165
to
3731605
Compare
@AlexCheema post our discussions yesterday I've chosen to consolidate all my changes up but not including structural generation (and hence function calling) into this PR to make it easier to review. I am going to add some tests and then this will be ready to review. I am going to rebase the latter two features on-top of this with a git history with less false starts |
I've added the tests and ensured bench.py runs successfully. This is now ready to review! |
I have used the official OpenAI SDK to perform the ChatGPT API tests as this is the reference client implementation. I have not however added this to any requirements / setup file as I could not find the other testing dependencies (eg pytest and pytest-asyncio) listed anywhere. |
…e zero-length token output
Building on #720 and #716 this introduces textual stop sequences to the ChatGPT API, aligning with the OpenAI API Reference, allowing a request to specify that generation should cease after a given textual sequence has been generated.
This is implemented by buffering tokens on the final node of the inference loop until enough characters have been generated to guarantee that the stop sequences listed are not present. If a stop sequence is found in the middle of a token or spanning tokens, the tokens returned from the final node to the ChatGPT API will not be a faithful replication of the tokens that the inference process emitted, but instead the text prior to the stop sequence retokenised.
This data is passed around in the
GenerationOptions
object as I introduced in #720. This PR should be considered after its two dependents, I will rebase this and apply any changes to it as needed.