Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar-structured generation example from docs hangs forever #1478

Open
nchammas opened this issue Mar 5, 2025 · 1 comment
Open

Grammar-structured generation example from docs hangs forever #1478

nchammas opened this issue Mar 5, 2025 · 1 comment
Labels

Comments

@nchammas
Copy link

nchammas commented Mar 5, 2025

Describe the issue as clearly as possible:

There are other issues on the tracker about Outlines hanging forever when using a grammar. However, this issue is specifically about the example provided here on the docs.

I am using the latest version of Outlines (0.2.1) and trying the arithmetic example exactly as it is on the docs.

I understand that CFG-structured generation is both experimental and community-contributed. But I think if something is featured prominently in the docs as a starting example then it should at least work.

Steps/code to reproduce the bug:

Install outlines[transformers] at 0.2.1. Then, try to run this example from the docs:

from outlines import models, generate

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""

model = models.transformers("WizardLM/WizardMath-7B-V1.1")
generator = generate.cfg(model, arithmetic_grammar)
sequence = generator(
  "Alice had 4 apples and Bob ate 2. "
  + "Write an expression for Alice's apples:"
)

print(sequence)
# (8-2)

Expected result:

It should return some arithmetic expression.

Error message:

$ python grammar.py 
Loading checkpoint shards: 100%|████████████████████████████| 2/2 [00:46<00:00, 23.04s/it]
.../.venv/lib/python3.11/site-packages/outlines/fsm/guide.py:110: UserWarning: Outlines' public *community-contributed* CFG structured generation is experimental. Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer
  warnings.warn(

<hangs forever>

Outlines/Python version information:

Version information

``` $ python -c "from outlines import _version; print(_version.version)"; python -c "import sys; print('Python', sys.version)"; pip freeze; 0.2.1 Python 3.11.11 (main, Feb 27 2025, 15:20:28) [Clang 16.0.0 (clang-1600.0.26.6)] accelerate==1.4.0 aiohappyeyeballs==2.4.8 aiohttp==3.11.13 aiosignal==1.3.2 airportsdata==20250224 annotated-types==0.7.0 attrs==25.1.0 certifi==2025.1.31 cfgv==3.4.0 charset-normalizer==3.4.1 cloudpickle==3.1.1 datasets==3.3.2 dill==0.3.8 diskcache==5.6.3 distlib==0.3.9 filelock==3.17.0 frozenlist==1.5.0 fsspec==2024.12.0 genson==1.3.0 huggingface-hub==0.29.2 identify==2.6.8 idna==3.10 interegular==0.3.3 iso3166==2.1.1 Jinja2==3.1.5 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 lark==1.2.2 MarkupSafe==3.0.2 mpmath==1.3.0 multidict==6.1.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 nodeenv==1.9.1 numpy==1.26.4 outlines==0.2.1 outlines_core==0.1.26 packaging==24.2 pandas==2.2.3 platformdirs==4.3.6 pre_commit==4.1.0 propcache==0.3.0 psutil==7.0.0 pyarrow==19.0.1 pydantic==2.10.6 pydantic_core==2.27.2 python-dateutil==2.9.0.post0 pytz==2025.1 PyYAML==6.0.2 referencing==0.36.2 regex==2024.11.6 requests==2.32.3 rpds-py==0.23.1 safetensors==0.5.3 sentencepiece==0.2.0 six==1.17.0 sympy==1.13.1 tokenizers==0.21.0 torch==2.6.0 tqdm==4.67.1 transformers==4.49.0 typing_extensions==4.12.2 tzdata==2025.1 urllib3==2.3.0 virtualenv==20.29.2 xxhash==3.5.0 yarl==1.18.3 ```

Context for the issue:

No response

@nchammas nchammas added the bug label Mar 5, 2025
@cpfiffer
Copy link
Contributor

cpfiffer commented Mar 5, 2025

I can replicate this. The issue here seems to be an infinitely repeating *1.5 term:

Generator setup time: 0.1493833065032959 seconds
Sequence generation time: 9.107961893081665 seconds
Output: (4-2)*.5*3.5*2.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*1.5*

I'm not super familiar with the grammars in Outlines, I'm curious if there's a missing EOS token in the grammar? Or is that handled internally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants