Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate HNSW-specific information to cost estimate #103

Merged
merged 2 commits into from
Sep 3, 2023
Merged

Incorporate HNSW-specific information to cost estimate #103

merged 2 commits into from
Sep 3, 2023

Conversation

yoloVoe
Copy link
Contributor

@yoloVoe yoloVoe commented Sep 3, 2023

Closes #14.

We do so by:

  1. Computing the number of tuples we expect to access on a given search query based on some bounds from the paper.
  2. Compute the number of datablock, header and blockmap accesses. Number of header accesses is assumed to be 1 as we expect it to stay in cache as everytime we access a blockmap, we also access header so it should stay in cache. For datablock and header, we follow basic logic from genericcostestimate here, but with some hnsw specific enhancements.
  3. Totalcost is the result from genericcostestimate, which should incorporate the number of tuples we fed it. We also scale the result based on our updated number of page accesses.

Commit message has more details too.

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Add log statements to hnsw.c so we can see a more detailed
breakdown of current generic cost estimate.

Also add hnsw_cost_estimate test so we can start
capturing detailed cost estimate.

test_runner.sh was also updated to ignore
values that are non-deterministic.
First, estimate number of index tuples accessed.
Then, based on that, estimate number of datablocks
and blockmaps accessed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Hnsw cost estimate
2 participants