Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add anything and until method in the DSL for reasoning models #1480

Open
rlouf opened this issue Mar 6, 2025 · 0 comments
Open

Add anything and until method in the DSL for reasoning models #1480

rlouf opened this issue Mar 6, 2025 · 0 comments
Assignees
Labels
enhancement interface Related to improving the user interface regex

Comments

@rlouf
Copy link
Member

rlouf commented Mar 6, 2025

This was triggered by trying to make structured generation with DeepSeek, where we want to let the model generate anything (unstructured) between the think tags, and start structured generation after </think> has been generated. It would thus make sense to have an anything construct that corresponds to .*, the unstructured case. Intuitively, we could add until method so that users could write the following for a classification task

from outlines.types import anything, either

model = ...

outline = "<think>" + anything.until("</think>") + "</think>" + either("yes", "no")
result = model.generate("Are you a reasoning model?", outline)

The implementation is trickier than it looks, anything.until("<think>") is best expressed with a negative lookahead:

((?!<\/think>).)*

but the regex engine that outlines-core uses does not support lookaheads. However, it is equivalent to the following regular expression:

([^<]|<[^\/]|<\/[^t]|<\/t[^h]|<\/th[^i]|<\/thi[^n]|<\/thin[^k]|<\/think[^>])*

The idea would thus be to have anything.until generate a Regex node with an "expanded lookahead". Computing the expansion shouldn't be too hard.

Note: we could also decide that "until" keeps the "" token. We should discuss.

Regexes

It is tempting to also implement until_regex this way. It is very simple for simple patterns like [0-9]:

(^[0-9])*

however it can get complicated quite quickly, for instance anything.until_regex("(abc|def)") which we would need to translate to:

([^ad]|a[^b]|ab[^c]|d[^e]|de[^f])*

so we will not implement this in the first iteration.

@rlouf rlouf added enhancement interface Related to improving the user interface regex labels Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement interface Related to improving the user interface regex
Projects
None yet
Development

No branches or pull requests

3 participants