Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AI-image-reader plugin #1863

Open
johnlanni opened this issue Mar 8, 2025 · 1 comment
Open

Implement AI-image-reader plugin #1863

johnlanni opened this issue Mar 8, 2025 · 1 comment
Assignees

Comments

@johnlanni
Copy link
Collaborator

The application of OCR models in vision models is relatively widespread, with low hallucination rates and significant value, and they can be supported first.
The OCR model capability of Mistral is extremely powerful and can be used to enhance other models that do not support OCR capabilities, such as QwQ and DeepSeek-R1. You can refer to the implementation idea of the AI-Search plugin. Based on requests under the OpenAI protocol, extract the image URL from messages. First, request Mistral or other OCR APIs. After obtaining the description, modify the prompt words and append the description of the picture to the user's original prompt words.

Mistral api doc: https://docs.mistral.ai/capabilities/document/#ocr-with-image
Qwen ocr:
https://help.aliyun.com/zh/model-studio/user-guide/qwen-vl-ocr

@johnlanni johnlanni added area/ai help wanted Extra attention is needed labels Mar 8, 2025
@github-project-automation github-project-automation bot moved this to Todo in Higress Mar 8, 2025
@johnlanni johnlanni changed the title Implement AI-Vision plugin Implement AI-image-reader plugin Mar 9, 2025
@kai2321
Copy link
Collaborator

kai2321 commented Mar 9, 2025

I want to have a try, Could you assign it to me, thanks

@CH3CHO CH3CHO removed the help wanted Extra attention is needed label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants