Multi-Modal Support for Enhanced Retrieval #12

Rajaniraiyn · 2024-05-27T21:20:16Z

Is there a plan to incorporate image embeddings along with OCR and metadata-based retrieval? Utilizing the CLIP model from Candle to generate image embeddings could provide clearer context and improve the accuracy of xrem’s results. If performance is a concern, downscaling images before embedding could be a viable solution.

jasonjmcghee · 2024-05-28T03:30:54Z

Sounds like a good enhancement - especially useful for indexing blender/photoshop/visual based tasks.

Highly encourage you to take a crack at implementing it!

Rajaniraiyn · 2024-05-28T20:09:51Z

Sounds great! I’ll try it out and reach out if I have any questions.

Thawab8 · 2024-06-22T03:08:30Z

microsoft Florence 2 might be a good option:
https://huggingface.co/microsoft/Florence-2-large
other tools are using https://moondream.ai/

jasonjmcghee · 2024-06-24T06:22:18Z

Yeah I've played with moondream and (when I did) it performed quite poorly on screenshots. I had a short interaction with the creator and it sounded like he was considering trying to tackle screenshots, but the project was currently focused on scenes (photographs etc)

I've been keeping an eye out... Closed models (OpenAI / anthropic) are able to look at a screenshot and build an html page to some degree, which tells me they have a pretty good understanding of screenshots and would perform well.

Maybe a fine tune in screenshots of moondream using a larger model would be possible.

Thawab8 · 2024-06-24T21:34:29Z

A few hours ago hf released an article on how to finetune Florence.
https://x.com/mervenoyann/status/1805265942487675139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Modal Support for Enhanced Retrieval #12

Multi-Modal Support for Enhanced Retrieval #12

Rajaniraiyn commented May 27, 2024

jasonjmcghee commented May 28, 2024

Rajaniraiyn commented May 28, 2024

Thawab8 commented Jun 22, 2024

jasonjmcghee commented Jun 24, 2024

Thawab8 commented Jun 24, 2024

Multi-Modal Support for Enhanced Retrieval #12

Multi-Modal Support for Enhanced Retrieval #12

Comments

Rajaniraiyn commented May 27, 2024

jasonjmcghee commented May 28, 2024

Rajaniraiyn commented May 28, 2024

Thawab8 commented Jun 22, 2024

jasonjmcghee commented Jun 24, 2024

Thawab8 commented Jun 24, 2024