Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid standalone usage of <-> operator using statement AST #56

Closed
var77 opened this issue Aug 17, 2023 · 3 comments · Fixed by #111
Closed

Forbid standalone usage of <-> operator using statement AST #56

var77 opened this issue Aug 17, 2023 · 3 comments · Fixed by #111
Labels
core Core Database

Comments

@var77
Copy link
Collaborator

var77 commented Aug 17, 2023

Currently we have only one operator <->

In src/hnsw/options.c file there's a function HnswGetMetricKind which will determine current operator class being used with and index and detect right metric kind for usearch index using the support function pointers.

This is great as we can have only one operator which will support various distance functions, but when used out of index scope for example in SELECT statement, the operator can not automatically detect which distance function should be used.

We are currently throwing an error when <-> is used out of index lookup. We are doing this using ExecutorStart_hook the hook implementation is defined in src/hnsw/options.c void executor_hook.
This function receives QueryDesc struct, and we are currently doing regexp matching on sourceText. This approach is not covering cases when the operator will be used with ORDER BY, but there won't be an index scan.

To fix all the cases we might use plannedstmt which contains the AST of planned statement, where we can find information about the indexes and much more.

After doing this changes theres hnsw_operators_todo.sql test file. The file should be renamed to hnsw_operators.sql and included in schedule.txt file

@Ngalstyan4 Ngalstyan4 added the core Core Database label Aug 20, 2023
@dqii
Copy link
Contributor

dqii commented Aug 22, 2023

Wouldn't this mean that if there is an ORDER BY <-> but the index is not triggered, e.g., because the cost estimator determined that it would be more cost effective to do a sequential search, then an error would be thrown?

@dqii
Copy link
Contributor

dqii commented Aug 23, 2023

It's a bit strange to me that I can't do <-> in SELECT. I did this a few times when testing pgvector to sanity check that I was getting the nearest vectors.

@var77
Copy link
Collaborator Author

var77 commented Aug 23, 2023

Yes that is true, it will throw an error in case of sequential scans, but here is some backstory why we decided to implement this feature.

We wanted to make the UX better by maintaining only one operator which will automatically determine which kind of distance function should be called. Currently in alternative solutions there is an separate operator for each distance function (e.g <->, <=>, <~>) so you should remember which operator to use for particular index.
Currently on index scans we can determine the right distance function, as we can look up the support function from the index Relation, but in case of sequential scan or SELECT statement we couldn't find an optimal way to determine which distance function to call. In case of SELECT I think it is not possible, but in case of seq scan there might be a way to look up the indexes defined on that field and try to determine the right distance function based on that information, but we might loose some performance there.

So at this moment we decided to forbid the usage of the operator, so it will not be confusing that let's say sometimes the operator returns l2 distance if used out of index and when used with index it returns cosine distance.

We can discuss this topic further with @Ngalstyan4 , maybe we can come up with a better solution.

@dqii dqii linked a pull request Sep 4, 2023 that will close this issue
@dqii dqii closed this as completed in #111 Sep 5, 2023
var77 added a commit that referenced this issue Oct 8, 2024
* Fix sample size to take from test table count

* Fix data type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core Database
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants