You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The application of OCR models in vision models is relatively widespread, with low hallucination rates and significant value, and they can be supported first.
The OCR model capability of Mistral is extremely powerful and can be used to enhance other models that do not support OCR capabilities, such as QwQ and DeepSeek-R1. You can refer to the implementation idea of the AI-Search plugin. Based on requests under the OpenAI protocol, extract the image URL from messages. First, request Mistral or other OCR APIs. After obtaining the description, modify the prompt words and append the description of the picture to the user's original prompt words.
The application of OCR models in vision models is relatively widespread, with low hallucination rates and significant value, and they can be supported first.
The OCR model capability of Mistral is extremely powerful and can be used to enhance other models that do not support OCR capabilities, such as QwQ and DeepSeek-R1. You can refer to the implementation idea of the AI-Search plugin. Based on requests under the OpenAI protocol, extract the image URL from messages. First, request Mistral or other OCR APIs. After obtaining the description, modify the prompt words and append the description of the picture to the user's original prompt words.
Mistral api doc: https://docs.mistral.ai/capabilities/document/#ocr-with-image
Qwen ocr:
https://help.aliyun.com/zh/model-studio/user-guide/qwen-vl-ocr
The text was updated successfully, but these errors were encountered: