Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to allow $ to delimit inline math #1089

Closed
dlqqq opened this issue Nov 5, 2024 · 0 comments · Fixed by #1094
Closed

Continue to allow $ to delimit inline math #1089

dlqqq opened this issue Nov 5, 2024 · 0 comments · Fixed by #1094
Assignees
Labels
bug Bugs reported by users

Comments

@dlqqq
Copy link
Member

dlqqq commented Nov 5, 2024

Problem

(Thank you @ellisonbg for originally raising this issue.)

There are two use-cases that would be ideal to support together in Jupyter AI's chat UI:

  1. Using $ symbols literally without double-escaping
  2. Using $ as a math

As of Jupyter AI v2.27.0, 1) is not supported, but 2) is not. #1068 implements 1) by breaking 2) by escaping all $ symbols outside of code blocks. However, this no longer allows $ symbols to delimit inline math, which is an unacceptable regression in the UX.

Below, I've outlined a few different approaches that would continue to allow $ to delimit inline math, with the added goal of also supporting $ symbols being used literally.

Proposed Solutions

  1. Only escape $ symbols in agent messages, and use the system prompt to encourage LLMs to use \( <math> \) instead to denote inline math.

    • The formatting of user input is perfectly aligned with how Jupyter Notebooks are rendered, while agent output is rendered differently.
    • However, this is dependent on the LLM's ability to respect the system prompt. This may be exacerbated by human messages using $<math>$, which may be interpreted as overriding the system prompt by the LLM.
  2. Intelligently escape $ symbols via regex substitution. An example implementation is provided below.

    Example implementation (provided by Claude 3.5 Sonnet)
    import re
    
    def escape_usd_dollars(text):
        def is_math_expr(s, pos):
            # Look ahead for matching closing $ and check for operators
            rest = s[pos:]
            # Find the next unescaped $
            next_dollar = None
            i = 0
            while i < len(rest):
                if rest[i] == '$' and (i == 0 or rest[i-1] != '\\'):
                    next_dollar = i
                    break
                i += 1
            
            if next_dollar is None:
                return False
                
            # Check the content between dollars for math operators
            content = rest[:next_dollar]
            math_operators = ['+', '-', '*', '/', '=', '<', '>', '±', '∑', '∫', '^']
            has_operator = any(op in content for op in math_operators)
            has_letters = bool(re.search(r'[a-zA-Z]', content))
            
            return has_operator or has_letters
    
        def replace_func(match):
            # Get the position of the match in the original string
            pos = match.start()
            
            # If it looks like a math expression, don't escape
            if is_math_expr(text, pos):
                return match.group(0)
            # Otherwise, escape it
            return r'\$' + match.group(1)
    
        # Pattern matches $ followed by any number format
        # Capturing the number part
        pattern = r'(?<!\\)\$(\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?!\S)|\d+(?!\S))'
        
        return re.sub(pattern, replace_func, text)
    
    # Test cases
    test_cases = [
        "The price is $50.00",                    # Should escape
        "This is $x + y$ in math",                # Should not escape
        "Already escaped: \$100",                 # Should not escape
        "Mixed: $100 and $x$ and \$200",         # Should selectively escape
        "Complex price: $1,234,567.89",           # Should escape
        "Math: $5 + 10$",                         # Should not escape
        "Math: $5x + 10y$",                       # Should not escape
        "Price list: $5, $10, $15",               # Should escape all
        "Equation: $5x = 10$",                    # Should not escape
        "Multiple prices: $5.99 and $10.99",      # Should escape both
        "Math with decimals: $5.5 * 2$"          # Should not escape
    ]
    
    for test in test_cases:
        result = escape_usd_dollars(test)
        print(f"Input:  {test}")
        print(f"Output: {result}\n")
    
    
    # general idea: only escape dollar symbols which precede possible numerals
    # then apply more complex regex in the replace_fn.
    • This can be improved further to handle ~95% of all use-cases, but may still fail for certain edge cases.
    • If a user encounters one of these edge cases, it may be very unclear why formatting is broken, since the escaping logic involves a lot of complex string processing rules that may change in minor releases.
    • This isn't aligned with how Jupyter Notebooks are rendered, though it aims to improve on the rendering experience in Jupyter Notebooks.
  3. Build a new math input UI.

    • A text popup would appear upon hitting some keyboard shortcut (e.g. Cmd + M). This then accepts raw TeX markup, without delimiters.
    • The popup would show a preview of the math, which would improve the UX of adding complex LaTeX expressions.
    • The popup would then allow one to insert the TeX markup back into the input, either as inline math or display math. There would also be keyboard shortcuts here.
    • This would encourage the user to not type delimiters themselves manually, and instead rely on the math input UI to type LaTeX.
    • Then, we can still escape all $ symbols, but keep the inline math experience streamlined.
  4. Revert Allow $ to literally denote quantities of USD in chat #1068 and accept $ symbols cannot be used literally unless double-escaped.

    • This is perfectly aligned with how Jupyter Notebooks are rendered.
    • We can update the system prompt to request LLMs double-escape $ symbols if they are to be used literally.
@dlqqq dlqqq added the bug Bugs reported by users label Nov 5, 2024
@dlqqq dlqqq self-assigned this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs reported by users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant