Token-triggered processors #1407

cpfiffer · 2025-02-07T21:21:45Z

There are certain cases where outlines users require unstructured text until a particular token is reached, such as <|beginfunctioncall|> , code blocks, JSON blocks. etc.

Currently, these are often handled with regular expressions that can be quite messy.

DeepSeek's thinking tokens are a great example. DeepSeek R1's response begins with <think> and then includes some amount of unstructured text, followed by </think>. Thinking tokens are the primary reason for R1's performance.

The issue with the current solution (specifying a regular expression) is that the regular expression is quite complicated and difficult to debug:

# Set up response format you want the LLM to respond with.
class YesNo(BaseModel):
    answer: Literal['yes', 'no']

yesno_regex = build_regex_from_schema(convert_json_schema_to_str(YesNo))

# Add the thinking prefix to the regex 
thinking_regex = r'<think>((.|\n){0,' + str(NUM_THINKING_CHARACTERS) + r'}?)\[TRUNCATED\]</think>'

# Combine the thinking regex and the yesno regex
result_regex = thinking_regex + yesno_regex

This includes both a thinking block and the structured output at the end.

Regular expression compilation times can explode if you want to include things like thought controls that limit the thinking response, add allowed/disallowed words, or structured text to the thinking block. For example, if you force the thinking block to have a range of lengths between 10 and 50:

thinking_regex = r'<think>((?:.|\n|\s){10,50}?)<\/think>(\s)*'

Compiling a single, smaller regular expression for only the thinking block could take some load off of our compiler, and simplify the user interface for complicated and mixed forms of structure.

Ideally, I would prefer to be able to structure the thinking block and the output block separately, i.e. have separate logit processors for different sections of text. You could also jump to any number of different logit processors based on tokens you hit.

Consider the following interface, where I define a default pattern pattern_1. When a b token (or substring?) is hit, I want to switch to pattern_2.

pattern_1 = r'.{10,50}'
pattern_2 = r'(cat|dog|squirrel)'

pattern_dict = {'<|bos|>':pattern_1, 'b':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)

or, in the case of DeepSeek R1 (roughly):

pattern_1 = r'.*'
pattern_2 = r'(cat|dog|squirrel)'

# NOTE: may want to provide a default 
#             pattern that begins at the BOS token
pattern_dict = {'<think>':pattern_1, '</think>':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)

I think it'd be a nice convenience feature, and it might cut down on compilation times for very complicated grammars. It would introduce some challenges in how to communicate patterns to inference servers like vLLM, which does not have a way to receive multiple patterns for constrained decoding.

Thoughts?

The text was updated successfully, but these errors were encountered:

cpfiffer · 2025-02-07T22:00:09Z

Also related to @rlouf's PR on a simple regex DSL (#1403), which should also help simplify the user experience. The advantage of token-triggered processors is mostly to separate concerns and reduce FSM compiler load.

903124 · 2025-02-09T15:35:35Z

Nice! I feel like it's better to define a optional start and end token since it could be not intuitive to have </think> at the start of some patter. For example

pattern_1 = {
    'pattern': r'.*',
    'start': '<think>',
    'end': '</think>'
}
pattern_2 = {
    'pattern': r'(cat|dog|squirrel)',
    'start': '',
    'end': ''
}

pattern_dict = {
    'pattern1': pattern_1,
    'pattern2': pattern_2
}
generator = outlines.generate.regex(model, pattern_dict)

Furthermore user can also reuse a regex pattern e.g. a simple character counter

Checking character 1: 's' - no match
────────────────────────────────────────
Checking character 2: 't' - no match
────────────────────────────────────────
Checking character 3: 'r' found! r_count = 1
────────────────────────────────────────
Checking character 4: 'a' - no match
────────────────────────────────────────
Checking character 5: 'w' - no match

that reuse a regex pattern until LLM output a terminating token e.g. <think>

rlouf · 2025-02-17T10:22:20Z

It would be better, albeit more verbose, to give users a simple interface to build custom logits processors. Here's a first idea:

build_processor(cond_fns, process_fns, init_state)

Where init_states is an array that contains the initial value of each state. The state can be anything, and is passed to the corresponding cond_fn and process_fn at each step.

cond_fns is an array of functions that return booleans which indicate whether the corresponding processing function should be applied. If cond_fn[i](states[i], token_ids) then process_fn[i] is applied.

cond_fns_i(states[i], token_ids)

process_fn is a list of functions that process logits in place. At each step we filter the array depending on the Boolean values generated at the previous state, and apply the functions in the order of their position in the list.

process_fns_i(states[i], token_ids, logits)

This is one possibility

cpfiffer · 2025-02-19T16:45:31Z

Option 1: Default processors + dictionary mapping

I like the idea of using the native processor interface. If we think the processor interface is reasonably stable, we could do something similar to what I did in #1408.

In this case we'd do

class Foo(BaseModel):
    a: int

class Bar(BaseModel):
    b: int

# Create processor from schema
processor_1 = JSONLogitsProcessor(Foo, tokenizer)
processor_2 = JSONLogitsProcessor(Bar, tokenizer)
processor_3 = RegexLogitsProcessor(r"[0-9]{4}", tokenizer)

processor = MultiLogitsProcessor(
    processor_1, # default processor
    {'foo': processor_2, 'bar', processor_2} # toggleable processors
)

Alternatively, uses should be able to provide an exact token ID that triggers another processor, though this is a terrible developer experience IMO.

Option 2: Wrapped processors

Another implementation might follow @903124's suggestion with optional stops by wrapping each processor in a TriggeredLogitsProcessor:

# Create processor from schema
processor_1 = TriggeredLogitsProcessor(
    JSONLogitsProcessor(Foo, tokenizer),
    start = '<think>',
    stop = '</think>' # optional stop string
)

processor = MultiLogitsProcessor(
    processor_1,
    ... # other TriggeredLogitsProcessors
)

This would be unstructured by default I imagine, if a processor falls outside its start/stop substring range.

Then, the TriggeredLogitsProcessor would simply ignore any tokens passed to it if the generated sequence does not match its start/stop criteria.

Of these, I think I prefer option 2. I think it could also be reasonably well wrapped by the interface @rlouf described.

rlouf · 2025-02-19T17:37:06Z

You can implement all of these with the interface I mentioned above, so we TriggeredLogitsProcess etc. could just be factory functions.

rlouf · 2025-02-22T19:44:31Z

I think I have a better solution. First, as far as debugging go, I'm not sure your solution is necessarily better than the regex alternative. Then, it is quite complex. I think a much better solution would be to let users write e.g.

from outlines.types import text

structure = text + "<think>" + text + "</think>" + regex(r"(yes|no)")

Here there is only one regex that should be compiled, text representing unconstrained generation. structure is a tree, and we can inspect its structure, so we should be able to transform

model(prompt, structure)

into

result = model(prompt, until="<think>")
prompt += result
result = model(prompt, until="</think>")
prompt += result
result = model(prompt, regex(r"(yes|no)"))

It is always possible to add a debug mode:

model(prompt, structure, debug=True)

or maybe better have another class of Generators:

generator = REPL(model, structure)
generator(prompt)

903124 · 2025-02-23T16:37:02Z

I'm thinking about this way recently:

Using the strawberry example above user can define a factory function

class strawberry_count(...):
    def thinking_process(self):
        return "think" + ... + "/think"
    def r_counting_loop(self,until="break"):
        while True:
              return "Checking ...." + ... + (or (newline/break))
    def conclusion(self):
         return ....
    def pipeline(self):
        return [ self.thinking_process,self.r_counting_loop, self.conclusion]

And do model(prompt,structure = strawberry_count)

cpfiffer · 2025-02-24T15:29:54Z

I think this is a good interface:

structure = text + "" + text + "" + regex(r"(yes|no)")

Correct me if I'm wrong, but this gives us a Sequence object, no? Are the token literals promoted to Terms?

In which case, building this interface doesn't seem particularly difficult and strikes me as having low mental overhead. It'd be nice to have

structure = text + "<think>" + text + json_block('python')  + text + "</think>" + regex(r"(yes|no)")

However, one issue here is that it does not seem obvious to me how to flexibly handle tokens that you may or may not need.

As a (very reductive) example, suppose I want to force the model to write "cat" after the sequence "my favorite pet is a ". If you want this from an unstructured block, I'd have to write my own regex to explicitly include this condition.

This is simple for my example, but complicated nested structures that trigger different types of structure could strain the compiler quite a bit. Suppose for example you did not know that you needed a LaTeX block, but if that block appears in the course of unstructured text, than you want to apply a LaTeX grammar. The Term method doesn't seem to be able to handle this, especially for out-of-order or unexpected blocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token-triggered processors #1407

Token-triggered processors #1407

cpfiffer commented Feb 7, 2025 •

edited

Loading

cpfiffer commented Feb 7, 2025

903124 commented Feb 9, 2025

rlouf commented Feb 17, 2025 •

edited

Loading

cpfiffer commented Feb 19, 2025

rlouf commented Feb 19, 2025

rlouf commented Feb 22, 2025

903124 commented Feb 23, 2025

cpfiffer commented Feb 24, 2025 •

edited

Loading

Token-triggered processors #1407

Token-triggered processors #1407

Comments

cpfiffer commented Feb 7, 2025 • edited Loading

cpfiffer commented Feb 7, 2025

903124 commented Feb 9, 2025

rlouf commented Feb 17, 2025 • edited Loading

cpfiffer commented Feb 19, 2025

Option 1: Default processors + dictionary mapping

Option 2: Wrapped processors

rlouf commented Feb 19, 2025

rlouf commented Feb 22, 2025

903124 commented Feb 23, 2025

cpfiffer commented Feb 24, 2025 • edited Loading

cpfiffer commented Feb 7, 2025 •

edited

Loading

rlouf commented Feb 17, 2025 •

edited

Loading

cpfiffer commented Feb 24, 2025 •

edited

Loading