Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing error occurred when using withStructuredOutput method with nested triple backticks #7752

Open
5 tasks done
YukJiSoo opened this issue Feb 25, 2025 · 1 comment
Open
5 tasks done
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@YukJiSoo
Copy link

YukJiSoo commented Feb 25, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import {
  ChatPromptTemplate,
  HumanMessagePromptTemplate,
  SystemMessagePromptTemplate,
} from '@langchain/core/prompts';
import { RunnableSequence } from '@langchain/core/runnables';
import { ChatOpenAI } from '@langchain/openai';
import { jsonSchemaToZod } from 'json-schema-to-zod';
import { z } from 'zod';

const parseZodSchemaString = (zodSchemaString: string): z.ZodTypeAny => {
  // ! Pass 'z' as a parameter to allow the function to recognize the Zod library internally
  const schema = new Function('z', `return ${zodSchemaString}`)(z);
  return schema;
};

const schema = {
  type: 'object',
  required: ['response', 'locale'],
  properties: {
    locale: {
      type: 'string',
    },
    response: {
      anyOf: [
        {
          type: 'string',
        },
        {
          type: 'object',
          additionalProperties: {
            anyOf: [
              {
                type: 'string',
              },
              {
                type: 'number',
              },
            ],
          },
        },
      ],
    },
  },
};

const zodSchema = parseZodSchemaString(jsonSchemaToZod(schema));

const model = new ChatOpenAI({
  model: 'gpt-4o-2024-08-06', // use the model that support structured output
}).withStructuredOutput(zodSchema, {
  method: 'jsonSchema',
  strict: true,
});

const systemMessages = [
  'You are a professional translator. You are fluent in various languages and provide accurate and natural translations. When translating given text into a target language, you perform translations that consider context and meaning rather than simple literal translations. You maintain the intent and nuance of the original text as much as possible while using natural expressions in the target language.',
];
const humanMessages = [
  `
<body>
{{body}} 
</body>

<locale>
{{locale}}
</locale>

Translate the given data 'body' into the language specified by locale. body can be either an object or a string.

## Rules

1. If body is an object, translate all values while maintaining the object's structure.
2. If body is a string, translate the entire string.
3. If locale is not provided, return the message "Translation is not possible. Target language is not specified." in the original language of body.
4. Always return the translation result using the "response" key.
5. Include the language of the translated result using the "locale" key in IETF language tag format.
`,
];

const prompt = ChatPromptTemplate.fromMessages([
  SystemMessagePromptTemplate.fromTemplate(systemMessages),
  HumanMessagePromptTemplate.fromTemplate(humanMessages, { templateFormat: 'mustache' }),
]);

const runnableSequence = RunnableSequence.from<unknown, unknown>([prompt, model]);

try {
  const input = {
    body: '```javascript\nconst variable = 123;\nconsole.log(variable);\n```',
    locale: 'ko_KR',
  };

  await runnableSequence.invoke(input);
} catch (error) {
  console.error(error);
}

Error Message and Stack Trace (if applicable)

OutputParserException [Error]: Failed to parse. Text: "{"locale":"ko-KR","response":"샘플 자바스크립트 코드\n\njavascript\nconst 변수 = 123;\nconsole.log(변수);\n"}". Error: SyntaxError: Unexpected token j in JSON at position 0

Troubleshooting URL: https://js.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE/

at StructuredOutputParser.parse (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/output_parsers/structured.cjs:89:19)
at StructuredOutputParser.parseResult (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/output_parsers/base.cjs:61:21)
at StructuredOutputParser._callWithConfig (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/output_parsers/base.cjs:46:72)
at StructuredOutputParser._callWithConfig (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:223:34)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at StructuredOutputParser._streamIterator (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:165:9)
at StructuredOutputParser.transform (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:402:9)
at RunnableSequence._streamIterator (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:1349:30)
at RunnableSequence.transform (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:402:9)
at RunnableSequence._streamIterator (/Users/user/sample-project/node_modules/.pnpm/@[email protected]/node_modules/@langchain/core/dist/runnables/base.cjs:1349:30) {

llmOutput: '{"locale":"ko-KR","response":"샘플 자바스크립트 코드\n\njavascript\\nconst variable = 123;\\nconsole.log(variable);\\n"}',
observation: undefined,
sendToLLM: false,
lc_error_code: 'OUTPUT_PARSING_FAILURE'
}

Description

Background

  • I have written and am using translation prompts.
  • Translations can also be in markdown format.
  • I am using structured output to use a structured response format.
  • I'm using the withStructuredOutput for this.

Problem

  • I am getting the attached error when triple backticks are present in the markdown content.
  • My guess is that the issue is caused by the inability to handle tripple backticks in JSON.parse, which is handled by the structured output parser used internally by withStructuredOutput.

Question

  • I would like to solve the problem while using withStructuredOutput.
  • Please let me know if there is a way to work around the issue.
  • I think it would be nice to get a new config from withStructuredOutput to handle this issue.
    • I was thinking that the new config could be passed a running parser or runnable before being processed by the structured output parser.
    • I'm curious to know what you think.
const model = new ChatOpenAI({
  model: 'gpt-4o-2024-08-06',
}).withStructuredOutput(zodSchema, {
  method: 'jsonSchema',
  strict: true,
  preprocessParser: customParser // parser before processing with structured output parser
});

System Info

pnpm version

  • 8.9.0

platform

mac, linux

node version

  • 18.16.0

package version

"@langchain/core": "0.3.37",
"@langchain/openai": "0.4.2",
"json-schema-to-zod": "2.6.0",
"zod": "3.24.1"
@YukJiSoo YukJiSoo changed the title Parsing error occurred when using withStructuredOutput method with nested triple nested backticks Parsing error occurred when using withStructuredOutput method with nested triple backticks Feb 25, 2025
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Feb 25, 2025
Copy link

dosubot bot commented Feb 25, 2025

Hey @YukJiSoo! I'm here to assist you with any bugs, questions, or contributions. Let's work together to solve this issue!

I found a similar unsolved discussion titled Unexpected non-whitespace character after JSON at position X, which might be related to the issue you're experiencing with parsing errors when using withStructuredOutput and nested triple backticks [1].

To address the problem with triple backticks in markdown content, you can preprocess the output before it is parsed by the structured output parser. LangChain.js provides a function called preprocessJsonInput that can handle escaped characters and remove markdown code block formatting. This function can be used to sanitize the output before parsing:

preprocessJsonInput("```\n{}\n```") // returns "{}"
preprocessJsonInput("```json\n{}\n```") // returns "{}"

You can implement a custom parser or preprocessing step to handle this before the structured output parser processes the data. This approach aligns with your suggestion of using a custom parser or runnable before the structured output parser.

Additionally, consider using the OutputFixingParser, which can help fix parsing errors by passing the error message and incorrect output to an LLM to attempt remediation [2][3].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant