Implement AI-image-reader plugin #1863

johnlanni · 2025-03-08T02:27:38Z

The application of OCR models in vision models is relatively widespread, with low hallucination rates and significant value, and they can be supported first.
The OCR model capability of Mistral is extremely powerful and can be used to enhance other models that do not support OCR capabilities, such as QwQ and DeepSeek-R1. You can refer to the implementation idea of the AI-Search plugin. Based on requests under the OpenAI protocol, extract the image URL from messages. First, request Mistral or other OCR APIs. After obtaining the description, modify the prompt words and append the description of the picture to the user's original prompt words.

Mistral api doc: https://docs.mistral.ai/capabilities/document/#ocr-with-image
Qwen ocr:
https://help.aliyun.com/zh/model-studio/user-guide/qwen-vl-ocr

kai2321 · 2025-03-09T13:40:53Z

I want to have a try, Could you assign it to me, thanks

johnlanni added area/ai help wanted Extra attention is needed labels Mar 8, 2025

github-project-automation bot added this to Higress Mar 8, 2025

github-project-automation bot moved this to Todo in Higress Mar 8, 2025

johnlanni changed the title ~~Implement AI-Vision plugin~~ Implement AI-image-reader plugin Mar 9, 2025

johnlanni added the sig/wasm label Mar 9, 2025

johnlanni assigned kai2321 Mar 9, 2025

CH3CHO removed the help wanted Extra attention is needed label Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AI-image-reader plugin #1863

Implement AI-image-reader plugin #1863

johnlanni commented Mar 8, 2025

kai2321 commented Mar 9, 2025

Implement AI-image-reader plugin #1863

Implement AI-image-reader plugin #1863

Comments

johnlanni commented Mar 8, 2025

kai2321 commented Mar 9, 2025