Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Modal Support for Enhanced Retrieval #12

Open
Rajaniraiyn opened this issue May 27, 2024 · 5 comments
Open

Multi-Modal Support for Enhanced Retrieval #12

Rajaniraiyn opened this issue May 27, 2024 · 5 comments

Comments

@Rajaniraiyn
Copy link

Is there a plan to incorporate image embeddings along with OCR and metadata-based retrieval? Utilizing the CLIP model from Candle to generate image embeddings could provide clearer context and improve the accuracy of xrem’s results. If performance is a concern, downscaling images before embedding could be a viable solution.

@jasonjmcghee
Copy link
Owner

Sounds like a good enhancement - especially useful for indexing blender/photoshop/visual based tasks.

Highly encourage you to take a crack at implementing it!

@Rajaniraiyn
Copy link
Author

Sounds great! I’ll try it out and reach out if I have any questions.

@Thawab8
Copy link

Thawab8 commented Jun 22, 2024

microsoft Florence 2 might be a good option:
https://huggingface.co/microsoft/Florence-2-large
other tools are using https://moondream.ai/

@jasonjmcghee
Copy link
Owner

Yeah I've played with moondream and (when I did) it performed quite poorly on screenshots. I had a short interaction with the creator and it sounded like he was considering trying to tackle screenshots, but the project was currently focused on scenes (photographs etc)

I've been keeping an eye out... Closed models (OpenAI / anthropic) are able to look at a screenshot and build an html page to some degree, which tells me they have a pretty good understanding of screenshots and would perform well.

Maybe a fine tune in screenshots of moondream using a larger model would be possible.

@Thawab8
Copy link

Thawab8 commented Jun 24, 2024

A few hours ago hf released an article on how to finetune Florence.
https://x.com/mervenoyann/status/1805265942487675139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants