Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: A filter_include argument for Collection.get and Collection.query #1622

Open
thorwhalen opened this issue Jan 9, 2024 · 2 comments · May be fixed by #1626
Open

[Feature Request]: A filter_include argument for Collection.get and Collection.query #1622

thorwhalen opened this issue Jan 9, 2024 · 2 comments · May be fixed by #1626
Assignees
Labels
enhancement New feature or request

Comments

@thorwhalen
Copy link

Describe the problem

The semantics of the output of Collection.get and Collection.query are not clear.
This, and other related matters, were already discussed in the issue Improve message on collection.get empty embeddings.

Consider this example. When doing a Collection.get with include=["metadatas", "documents"] we get:

{
  'ids': ['foo', 'apple'],
  'embeddings': None,
  'metadatas': [None, None],
  'documents': ['bar', 'crumble'],
  'uris': None,
  'data': None
}

The None of "uris" means "you didn't ask for it".
The [None, None] of the "metadatas" means "you asked for it, but there were None.

I'd argue that:

{
  'ids': ['foo', 'apple'],
  'metadatas': [None, None],
  'documents': ['bar', 'crumble'],
}

might be less confusing.

Describe the proposed solution

Include a filter_include argument in Collection.get and Collection.query, which (if True) will have the effect of filtering in only those fields that were requested (plus "ids").

I propose to set the default of filter_include to be False for now, controlled via a DFLT_FILTER_INCLUDE variable.
This is so that the current behavior doesn't change, so we have minimal disruption.

Later, if worth it, we can switch to a DFLT_FILTER_INCLUDE=True default.
We can also consider letting users control this via some settings/configs mechanism.

Alternatives considered

Alternatives can be found in issue 300.

Importance

would make my life easier

Additional Information

I intend on doing a PR on this.

@thorwhalen thorwhalen added the enhancement New feature or request label Jan 9, 2024
@thorwhalen thorwhalen changed the title [Feature Request]: [Feature Request]: A filter_include argument for Collection.get and Collection.query Jan 9, 2024
@thorwhalen
Copy link
Author

Personally, I'm not a fan of the filter_include name for the argument, but haven't thought of better. Do propose if you think of something.

@thorwhalen
Copy link
Author

thorwhalen commented Jan 9, 2024

Implemented in PR: #1626

@thorwhalen thorwhalen linked a pull request Jan 12, 2024 that will close this issue
@atroyn atroyn self-assigned this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants