Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token-triggered processors #1407

Open
cpfiffer opened this issue Feb 7, 2025 · 8 comments
Open

Token-triggered processors #1407

cpfiffer opened this issue Feb 7, 2025 · 8 comments

Comments

@cpfiffer
Copy link
Contributor

cpfiffer commented Feb 7, 2025

There are certain cases where outlines users require unstructured text until a particular token is reached, such as <|beginfunctioncall|> , code blocks, JSON blocks. etc.

Currently, these are often handled with regular expressions that can be quite messy.

DeepSeek's thinking tokens are a great example. DeepSeek R1's response begins with <think> and then includes some amount of unstructured text, followed by </think>. Thinking tokens are the primary reason for R1's performance.

The issue with the current solution (specifying a regular expression) is that the regular expression is quite complicated and difficult to debug:

# Set up response format you want the LLM to respond with.
class YesNo(BaseModel):
    answer: Literal['yes', 'no']

yesno_regex = build_regex_from_schema(convert_json_schema_to_str(YesNo))

# Add the thinking prefix to the regex 
thinking_regex = r'<think>((.|\n){0,' + str(NUM_THINKING_CHARACTERS) + r'}?)\[TRUNCATED\]</think>'

# Combine the thinking regex and the yesno regex
result_regex = thinking_regex + yesno_regex

This includes both a thinking block and the structured output at the end.

Regular expression compilation times can explode if you want to include things like thought controls that limit the thinking response, add allowed/disallowed words, or structured text to the thinking block. For example, if you force the thinking block to have a range of lengths between 10 and 50:

thinking_regex = r'<think>((?:.|\n|\s){10,50}?)<\/think>(\s)*'

Compiling a single, smaller regular expression for only the thinking block could take some load off of our compiler, and simplify the user interface for complicated and mixed forms of structure.

Ideally, I would prefer to be able to structure the thinking block and the output block separately, i.e. have separate logit processors for different sections of text. You could also jump to any number of different logit processors based on tokens you hit.

Consider the following interface, where I define a default pattern pattern_1. When a b token (or substring?) is hit, I want to switch to pattern_2.

pattern_1 = r'.{10,50}'
pattern_2 = r'(cat|dog|squirrel)'

pattern_dict = {'<|bos|>':pattern_1, 'b':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)

or, in the case of DeepSeek R1 (roughly):

pattern_1 = r'.*'
pattern_2 = r'(cat|dog|squirrel)'

# NOTE: may want to provide a default 
#             pattern that begins at the BOS token
pattern_dict = {'<think>':pattern_1, '</think>':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)

I think it'd be a nice convenience feature, and it might cut down on compilation times for very complicated grammars. It would introduce some challenges in how to communicate patterns to inference servers like vLLM, which does not have a way to receive multiple patterns for constrained decoding.

Thoughts?

@cpfiffer
Copy link
Contributor Author

cpfiffer commented Feb 7, 2025

Also related to @rlouf's PR on a simple regex DSL (#1403), which should also help simplify the user experience. The advantage of token-triggered processors is mostly to separate concerns and reduce FSM compiler load.

@903124
Copy link

903124 commented Feb 9, 2025

Nice! I feel like it's better to define a optional start and end token since it could be not intuitive to have </think> at the start of some patter. For example

pattern_1 = {
    'pattern': r'.*',
    'start': '<think>',
    'end': '</think>'
}
pattern_2 = {
    'pattern': r'(cat|dog|squirrel)',
    'start': '',
    'end': ''
}

pattern_dict = {
    'pattern1': pattern_1,
    'pattern2': pattern_2
}
generator = outlines.generate.regex(model, pattern_dict)

Furthermore user can also reuse a regex pattern e.g. a simple character counter

Checking character 1: 's' - no match
────────────────────────────────────────
Checking character 2: 't' - no match
────────────────────────────────────────
Checking character 3: 'r' found! r_count = 1
────────────────────────────────────────
Checking character 4: 'a' - no match
────────────────────────────────────────
Checking character 5: 'w' - no match

that reuse a regex pattern until LLM output a terminating token e.g. <think>

@rlouf
Copy link
Member

rlouf commented Feb 17, 2025

It would be better, albeit more verbose, to give users a simple interface to build custom logits processors. Here's a first idea:

build_processor(cond_fns, process_fns, init_state)

Where init_states is an array that contains the initial value of each state. The state can be anything, and is passed to the corresponding cond_fn and process_fn at each step.

cond_fns is an array of functions that return booleans which indicate whether the corresponding processing function should be applied. If cond_fn[i](states[i], token_ids) then process_fn[i] is applied.

cond_fns_i(states[i], token_ids)

process_fn is a list of functions that process logits in place. At each step we filter the array depending on the Boolean values generated at the previous state, and apply the functions in the order of their position in the list.

process_fns_i(states[i], token_ids, logits)

This is one possibility

@cpfiffer
Copy link
Contributor Author

Option 1: Default processors + dictionary mapping

I like the idea of using the native processor interface. If we think the processor interface is reasonably stable, we could do something similar to what I did in #1408.

In this case we'd do

class Foo(BaseModel):
    a: int

class Bar(BaseModel):
    b: int

# Create processor from schema
processor_1 = JSONLogitsProcessor(Foo, tokenizer)
processor_2 = JSONLogitsProcessor(Bar, tokenizer)
processor_3 = RegexLogitsProcessor(r"[0-9]{4}", tokenizer)

processor = MultiLogitsProcessor(
    processor_1, # default processor
    {'foo': processor_2, 'bar', processor_2} # toggleable processors
)   

Alternatively, uses should be able to provide an exact token ID that triggers another processor, though this is a terrible developer experience IMO.

Option 2: Wrapped processors

Another implementation might follow @903124's suggestion with optional stops by wrapping each processor in a TriggeredLogitsProcessor:

# Create processor from schema
processor_1 = TriggeredLogitsProcessor(
    JSONLogitsProcessor(Foo, tokenizer),
    start = '<think>',
    stop = '</think>' # optional stop string
)

processor = MultiLogitsProcessor(
    processor_1,
    ... # other TriggeredLogitsProcessors
)   

This would be unstructured by default I imagine, if a processor falls outside its start/stop substring range.

Then, the TriggeredLogitsProcessor would simply ignore any tokens passed to it if the generated sequence does not match its start/stop criteria.

Of these, I think I prefer option 2. I think it could also be reasonably well wrapped by the interface @rlouf described.

@rlouf
Copy link
Member

rlouf commented Feb 19, 2025

You can implement all of these with the interface I mentioned above, so we TriggeredLogitsProcess etc. could just be factory functions.

@rlouf
Copy link
Member

rlouf commented Feb 22, 2025

I think I have a better solution. First, as far as debugging go, I'm not sure your solution is necessarily better than the regex alternative. Then, it is quite complex. I think a much better solution would be to let users write e.g.

from outlines.types import text

structure = text + "<think>" + text + "</think>" + regex(r"(yes|no)")

Here there is only one regex that should be compiled, text representing unconstrained generation. structure is a tree, and we can inspect its structure, so we should be able to transform

model(prompt, structure)

into

result = model(prompt, until="<think>")
prompt += result
result = model(prompt, until="</think>")
prompt += result
result = model(prompt, regex(r"(yes|no)"))

It is always possible to add a debug mode:

model(prompt, structure, debug=True)

or maybe better have another class of Generators:

generator = REPL(model, structure)
generator(prompt)

@903124
Copy link

903124 commented Feb 23, 2025

I'm thinking about this way recently:

Using the strawberry example above user can define a factory function

class strawberry_count(...):
    def thinking_process(self):
        return "think" + ... + "/think"
    def r_counting_loop(self,until="break"):
        while True:
              return "Checking ...." + ... + (or (newline/break))
    def conclusion(self):
         return ....
    def pipeline(self):
        return [ self.thinking_process,self.r_counting_loop, self.conclusion]

And do model(prompt,structure = strawberry_count)

@cpfiffer
Copy link
Contributor Author

cpfiffer commented Feb 24, 2025

I think this is a good interface:

structure = text + "" + text + "" + regex(r"(yes|no)")

Correct me if I'm wrong, but this gives us a Sequence object, no? Are the token literals promoted to Terms?

In which case, building this interface doesn't seem particularly difficult and strikes me as having low mental overhead. It'd be nice to have

structure = text + "<think>" + text + json_block('python')  + text + "</think>" + regex(r"(yes|no)")

However, one issue here is that it does not seem obvious to me how to flexibly handle tokens that you may or may not need.

As a (very reductive) example, suppose I want to force the model to write "cat" after the sequence "my favorite pet is a ". If you want this from an unstructured block, I'd have to write my own regex to explicitly include this condition.

This is simple for my example, but complicated nested structures that trigger different types of structure could strain the compiler quite a bit. Suppose for example you did not know that you needed a LaTeX block, but if that block appears in the course of unstructured text, than you want to apply a LaTeX grammar. The Term method doesn't seem to be able to handle this, especially for out-of-order or unexpected blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants