-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chroma vector store doesn't return similar questions (even the same question) #187
Comments
Hmm I think my bad when concluding too quickly that it returns But why when I query exactly the same question which exists in |
Hello @mrtunguyen, I want to express my gratitude for your valuable contribution to enhancing our work. I've conducted tests on the latest version of Dataherald, and using ChromaDB, I was able to successfully retrieve the same question that I had added as golden records. One potential reason for not obtaining results from the Chroma vector store could be related to the way ChromaDB stores vectors in memory. Whenever you create a new container with the --build flag, you lose the previously stored vectors in ChromaDB. However, please be aware that you will still see the golden records stored in the Mongo collection. |
For the tests I've done, it seems to be working fine. When I ask exactly the same question I get
It then checks that the table/columns etc. are those recognized by the system (i.e. those that have been processed by /api/v1/table-descriptions/sync-schemas I guess) providing that checks out it should work. |
thank you for your all replies. I will do a check today.
In that case, shouldn't we need to update chromadb automatically with golden records stored in Mongo? |
@mrtunguyen 1. Script-Based Solution: 2. Dockerized Chroma Solution: If you do implement these it would be great if you could raise a (PR). Otherwise we will get a fix in place early next week. |
I have build a script-based solution that takes config from json files and other database stores. Data that doesn't change much like database-connections and table-descriptions, instructions can be stored as json in a file and loaded but golden-records are updated over time to provide better coverage. I run the script on demand when some config data changes. I'm wondering if this is something I can contribute with? It wont satisfy every users requirements but it could be used as a reference. |
I think it should be done if the vector store being used is Chroma, and not necessary in other cases (for ex Pinecone) |
In my case it is just the default Chroma Context Store but I think it doesn't matter because the scripts use the API to upload. As long as your flavor of Context Store implements the dataherald.context_store.ContextStore the interface is supported. |
@mrtunguyen Hi, We added a script to upload the golden records in vector stores (Chroma|Pinecone) from MongoDB golden_records collection. Please check this in the documentation Just run this command:
|
Hi,
I encountered the problem that the chroma vector store doesn't return exact the same question that was put into the golden records. When I deep dive into the code, I found out that when you add record into chroma collection, you don't add embeddings for that record, which become
None
. I think that's why it doesn't work as expected.dataherald/dataherald/vector_store/chroma.py
Line 44 in 9e39613
The text was updated successfully, but these errors were encountered: