Continue to allow `$` to delimit inline math #1089

dlqqq · 2024-11-05T18:47:53Z

Problem

(Thank you @ellisonbg for originally raising this issue.)

There are two use-cases that would be ideal to support together in Jupyter AI's chat UI:

Using $ symbols literally without double-escaping
Using $ as a math

As of Jupyter AI v2.27.0, 1) is not supported, but 2) is not. #1068 implements 1) by breaking 2) by escaping all $ symbols outside of code blocks. However, this no longer allows $ symbols to delimit inline math, which is an unacceptable regression in the UX.

Below, I've outlined a few different approaches that would continue to allow $ to delimit inline math, with the added goal of also supporting $ symbols being used literally.

Proposed Solutions

Only escape $ symbols in agent messages, and use the system prompt to encourage LLMs to use $ <math> $ instead to denote inline math.
- The formatting of user input is perfectly aligned with how Jupyter Notebooks are rendered, while agent output is rendered differently.
- However, this is dependent on the LLM's ability to respect the system prompt. This may be exacerbated by human messages using $<math>$ , which may be interpreted as overriding the system prompt by the LLM.

Intelligently escape $ symbols via regex substitution. An example implementation is provided below.

Example implementation (provided by Claude 3.5 Sonnet)

import re

def escape_usd_dollars(text):
    def is_math_expr(s, pos):
        # Look ahead for matching closing $ and check for operators
        rest = s[pos:]
        # Find the next unescaped $
        next_dollar = None
        i = 0
        while i < len(rest):
            if rest[i] == '$' and (i == 0 or rest[i-1] != '\\'):
                next_dollar = i
                break
            i += 1
        
        if next_dollar is None:
            return False
            
        # Check the content between dollars for math operators
        content = rest[:next_dollar]
        math_operators = ['+', '-', '*', '/', '=', '<', '>', '±', '∑', '∫', '^']
        has_operator = any(op in content for op in math_operators)
        has_letters = bool(re.search(r'[a-zA-Z]', content))
        
        return has_operator or has_letters

    def replace_func(match):
        # Get the position of the match in the original string
        pos = match.start()
        
        # If it looks like a math expression, don't escape
        if is_math_expr(text, pos):
            return match.group(0)
        # Otherwise, escape it
        return r'\$' + match.group(1)

    # Pattern matches $ followed by any number format
    # Capturing the number part
    pattern = r'(?<!\\)\$(\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?!\S)|\d+(?!\S))'
    
    return re.sub(pattern, replace_func, text)

# Test cases
test_cases = [
    "The price is $50.00",                    # Should escape
    "This is $x + y$ in math",                # Should not escape
    "Already escaped: \$100",                 # Should not escape
    "Mixed: $100 and $x$ and \$200",         # Should selectively escape
    "Complex price: $1,234,567.89",           # Should escape
    "Math: $5 + 10$",                         # Should not escape
    "Math: $5x + 10y$",                       # Should not escape
    "Price list: $5, $10, $15",               # Should escape all
    "Equation: $5x = 10$",                    # Should not escape
    "Multiple prices: $5.99 and $10.99",      # Should escape both
    "Math with decimals: $5.5 * 2$"          # Should not escape
]

for test in test_cases:
    result = escape_usd_dollars(test)
    print(f"Input:  {test}")
    print(f"Output: {result}\n")


# general idea: only escape dollar symbols which precede possible numerals
# then apply more complex regex in the replace_fn.

This can be improved further to handle ~95% of all use-cases, but may still fail for certain edge cases.
If a user encounters one of these edge cases, it may be very unclear why formatting is broken, since the escaping logic involves a lot of complex string processing rules that may change in minor releases.
This isn't aligned with how Jupyter Notebooks are rendered, though it aims to improve on the rendering experience in Jupyter Notebooks.

Build a new math input UI.
- A text popup would appear upon hitting some keyboard shortcut (e.g. Cmd + M). This then accepts raw TeX markup, without delimiters.
- The popup would show a preview of the math, which would improve the UX of adding complex LaTeX expressions.
- The popup would then allow one to insert the TeX markup back into the input, either as inline math or display math. There would also be keyboard shortcuts here.
- This would encourage the user to not type delimiters themselves manually, and instead rely on the math input UI to type LaTeX.
- Then, we can still escape all $ symbols, but keep the inline math experience streamlined.
Revert Allow $ to literally denote quantities of USD in chat #1068 and accept $ symbols cannot be used literally unless double-escaped.
- This is perfectly aligned with how Jupyter Notebooks are rendered.
- We can update the system prompt to request LLMs double-escape $ symbols if they are to be used literally.

The text was updated successfully, but these errors were encountered:

dlqqq added the bug Bugs reported by users label Nov 5, 2024

dlqqq self-assigned this Nov 5, 2024

dlqqq mentioned this issue Nov 6, 2024

Continue to allow $ symbols to delimit inline math in human messages #1094

Merged

dlqqq closed this as completed in #1094 Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue to allow `$` to delimit inline math #1089

Continue to allow `$` to delimit inline math #1089

dlqqq commented Nov 5, 2024 •

edited

Loading

Continue to allow $ to delimit inline math #1089

Continue to allow $ to delimit inline math #1089

Comments

dlqqq commented Nov 5, 2024 • edited Loading

Problem

Proposed Solutions

Continue to allow `$` to delimit inline math #1089

Continue to allow `$` to delimit inline math #1089

dlqqq commented Nov 5, 2024 •

edited

Loading