You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Thank you @ellisonbg for originally raising this issue.)
There are two use-cases that would be ideal to support together in Jupyter AI's chat UI:
Using $ symbols literally without double-escaping
Using $ as a math
As of Jupyter AI v2.27.0, 1) is not supported, but 2) is not. #1068 implements 1) by breaking 2) by escaping all $ symbols outside of code blocks. However, this no longer allows $ symbols to delimit inline math, which is an unacceptable regression in the UX.
Below, I've outlined a few different approaches that would continue to allow $ to delimit inline math, with the added goal of also supporting $ symbols being used literally.
Proposed Solutions
Only escape $ symbols in agent messages, and use the system prompt to encourage LLMs to use \( <math> \) instead to denote inline math.
The formatting of user input is perfectly aligned with how Jupyter Notebooks are rendered, while agent output is rendered differently.
However, this is dependent on the LLM's ability to respect the system prompt. This may be exacerbated by human messages using $<math>$, which may be interpreted as overriding the system prompt by the LLM.
Intelligently escape $ symbols via regex substitution. An example implementation is provided below.
Example implementation (provided by Claude 3.5 Sonnet)
importredefescape_usd_dollars(text):
defis_math_expr(s, pos):
# Look ahead for matching closing $ and check for operatorsrest=s[pos:]
# Find the next unescaped $next_dollar=Nonei=0whilei<len(rest):
ifrest[i] =='$'and (i==0orrest[i-1] !='\\'):
next_dollar=ibreaki+=1ifnext_dollarisNone:
returnFalse# Check the content between dollars for math operatorscontent=rest[:next_dollar]
math_operators= ['+', '-', '*', '/', '=', '<', '>', '±', '∑', '∫', '^']
has_operator=any(opincontentforopinmath_operators)
has_letters=bool(re.search(r'[a-zA-Z]', content))
returnhas_operatororhas_lettersdefreplace_func(match):
# Get the position of the match in the original stringpos=match.start()
# If it looks like a math expression, don't escapeifis_math_expr(text, pos):
returnmatch.group(0)
# Otherwise, escape itreturnr'\$'+match.group(1)
# Pattern matches $ followed by any number format# Capturing the number partpattern=r'(?<!\\)\$(\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?!\S)|\d+(?!\S))'returnre.sub(pattern, replace_func, text)
# Test casestest_cases= [
"The price is $50.00", # Should escape"This is $x + y$ in math", # Should not escape"Already escaped: \$100", # Should not escape"Mixed: $100 and $x$ and \$200", # Should selectively escape"Complex price: $1,234,567.89", # Should escape"Math: $5 + 10$", # Should not escape"Math: $5x + 10y$", # Should not escape"Price list: $5, $10, $15", # Should escape all"Equation: $5x = 10$", # Should not escape"Multiple prices: $5.99 and $10.99", # Should escape both"Math with decimals: $5.5 * 2$"# Should not escape
]
fortestintest_cases:
result=escape_usd_dollars(test)
print(f"Input: {test}")
print(f"Output: {result}\n")
# general idea: only escape dollar symbols which precede possible numerals# then apply more complex regex in the replace_fn.
This can be improved further to handle ~95% of all use-cases, but may still fail for certain edge cases.
If a user encounters one of these edge cases, it may be very unclear why formatting is broken, since the escaping logic involves a lot of complex string processing rules that may change in minor releases.
This isn't aligned with how Jupyter Notebooks are rendered, though it aims to improve on the rendering experience in Jupyter Notebooks.
Build a new math input UI.
A text popup would appear upon hitting some keyboard shortcut (e.g. Cmd + M). This then accepts raw TeX markup, without delimiters.
The popup would show a preview of the math, which would improve the UX of adding complex LaTeX expressions.
The popup would then allow one to insert the TeX markup back into the input, either as inline math or display math. There would also be keyboard shortcuts here.
This would encourage the user to not type delimiters themselves manually, and instead rely on the math input UI to type LaTeX.
Then, we can still escape all $ symbols, but keep the inline math experience streamlined.
Problem
(Thank you @ellisonbg for originally raising this issue.)
There are two use-cases that would be ideal to support together in Jupyter AI's chat UI:
$
symbols literally without double-escaping$
as a mathAs of Jupyter AI v2.27.0, 1) is not supported, but 2) is not. #1068 implements 1) by breaking 2) by escaping all
$
symbols outside of code blocks. However, this no longer allows$
symbols to delimit inline math, which is an unacceptable regression in the UX.Below, I've outlined a few different approaches that would continue to allow
$
to delimit inline math, with the added goal of also supporting$
symbols being used literally.Proposed Solutions
Only escape
$
symbols in agent messages, and use the system prompt to encourage LLMs to use\( <math> \)
instead to denote inline math.$<math>$
, which may be interpreted as overriding the system prompt by the LLM.Intelligently escape
$
symbols via regex substitution. An example implementation is provided below.Example implementation (provided by Claude 3.5 Sonnet)
Build a new math input UI.
Cmd + M
). This then accepts raw TeX markup, without delimiters.$
symbols, but keep the inline math experience streamlined.Revert Allow
$
to literally denote quantities of USD in chat #1068 and accept$
symbols cannot be used literally unless double-escaped.$
symbols if they are to be used literally.The text was updated successfully, but these errors were encountered: