Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance operator <-> wrongly used in CROSS-JOIN queries without a vector index #227

Closed
Ngalstyan4 opened this issue Nov 25, 2023 · 1 comment · Fixed by #228
Closed
Labels
bug Something isn't working

Comments

@Ngalstyan4
Copy link
Contributor

Ngalstyan4 commented Nov 25, 2023

Query:

-- 1)
CREATE TABLE imdb_reviews (
  id SERIAL PRIMARY KEY,
  imdb_id int NOT NULL UNIQUE,
  review text,
  positive_review bool
);

-- 2) populate table
 INSERT INTO imdb_reviews (imdb_id,review, positive_review) VALUES %s

-- 3) generate review_embedding column via the cloud

-- 4) Run the query
SELECT
  forall.imdb_id, 
  nearest_per_id.near_imdb_ids, nearest_per_id.imdb_dists
FROM
  (
    SELECT
      imdb_id, review_embedding
    FROM
      imdb_reviews
    LIMIT 100000
  ) AS forall
  JOIN LATERAL (
    SELECT
      ARRAY_AGG(imdb_id) AS near_imdb_ids, 
      ARRAY_AGG(imdb_dist) AS imdb_dists
    FROM
      (
        SELECT
          t2.imdb_id,
          cos_dist(forall.review_embedding, t2.review_embedding) AS imdb_dist
        FROM
          imdb_reviews t2
        ORDER BY
          forall.review_embedding <-> t2.review_embedding
        LIMIT
          5
      ) AS __unused_name
  ) nearest_per_id ON TRUE
ORDER BY
  forall.imdb_id;

Expected: the query in step 4 to fail with error Operator <-> can only be used inside of an index

Seeing: The query runs a sequential scan with the following query plan:

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Sort  (cost=97888136.64..97888199.14 rows=25000 width=68)
   Sort Key: imdb_reviews_new111.imdb_id
   ->  Nested Loop  (cost=6117.34..97886310.43 rows=25000 width=68)
         ->  Limit  (cost=0.00..5577.00 rows=25000 width=471)
               ->  Seq Scan on imdb_reviews_new111  (cost=0.00..5577.00 rows=25000 width=471)
         ->  Memoize  (cost=6117.34..6117.36 rows=1 width=64)
               Cache Key: imdb_reviews_new111.review_embedding, imdb_reviews_new111.review_embedding
               Cache Mode: binary
               ->  Aggregate  (cost=6117.33..6117.34 rows=1 width=64)
                     ->  Limit  (cost=6117.24..6117.25 rows=5 width=12)
                           ->  Sort  (cost=6117.24..6179.74 rows=25000 width=12)
                                 Sort Key: ((imdb_reviews_new111.review_embedding <-> t2.review_embedding))
                                 ->  Seq Scan on imdb_reviews_new111 t2  (cost=0.00..5702.00 rows=25000 width=12)

Once the index is created, the plan looks like the following:

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=137718.49..137780.99 rows=25000 width=68)
   Sort Key: imdb_reviews_new111.imdb_id
   ->  Nested Loop  (cost=5.17..135892.29 rows=25000 width=68)
         ->  Limit  (cost=0.00..5577.00 rows=25000 width=492)
               ->  Seq Scan on imdb_reviews_new111  (cost=0.00..5577.00 rows=25000 width=492)
         ->  Aggregate  (cost=5.17..5.18 rows=1 width=64)
               ->  Limit  (cost=0.00..5.09 rows=5 width=12)
                     ->  Index Scan using imdb_reviews_new111_review_embedding_idx on imdb_reviews_new111 t2  (cost=0.00..25463.06 rows=25000 width=12)
                           Order By: (review_embedding <-> imdb_reviews_new111.review_embedding)
@Ngalstyan4 Ngalstyan4 added the bug Something isn't working label Nov 25, 2023
@dqii
Copy link
Contributor

dqii commented Nov 25, 2023

This is probably a missing node in plan_tree_walker.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants