Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw error when index operator is used without index #124

Merged
merged 41 commits into from
Sep 10, 2023
Merged

Throw error when index operator is used without index #124

merged 41 commits into from
Sep 10, 2023

Conversation

dqii
Copy link
Contributor

@dqii dqii commented Sep 8, 2023

This implementation uses the ExecutorStart_hook to detect that the usage of the operator <-> is only in the context of an index scan.

@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Merging #124 (e3e3e10) into main (4646afc) will decrease coverage by 0.71%.
Report is 10 commits behind head on main.
The diff coverage is 82.98%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #124      +/-   ##
==========================================
- Coverage   84.28%   83.57%   -0.71%     
==========================================
  Files          13       16       +3     
  Lines        1069     1163      +94     
  Branches      238      258      +20     
==========================================
+ Hits          901      972      +71     
- Misses         74       81       +7     
- Partials       94      110      +16     
Files Changed Coverage Δ
src/hnsw/options.c 86.66% <66.66%> (+1.81%) ⬆️
src/hooks/plan_tree_walker.c 68.75% <68.75%> (ø)
src/hooks/executor_start.c 88.88% <88.88%> (ø)
src/hooks/post_parse.c 95.31% <95.31%> (ø)
src/hooks/utils.c 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

@dqii dqii marked this pull request as ready for review September 9, 2023 00:25
@dqii
Copy link
Contributor Author

dqii commented Sep 9, 2023

I considered passing data from the post_parse_analyze_hook to the ExecutorStart_hook indicating that the operator is used, so that we didn't do this check in ExecutorStart_hook unnecessarily. The approaches didn't seem clean. I didn't want global variables. I didn't want a hash table. There's no built in way. I think it's better to just accept the duplication.

I considered removing the parse analyzer. This doesn't work because the planner is missing some information that the parser has.

I considered reducing the number of tree iterations by the post-parse hook by merging the operator check and the sort function. I didn't think this was worth it given that most queries probably won't use the index.

Lastly, more could be done to flesh out the plan tree walker. I opted against it for this diff to keep it simpler and because the other cases were lower priority.

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

{
if(node == NULL) return false;
if(IsA(node, IndexScan)) {
context->isIndexScan = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's happening here?
why do we set 'isIndexScan' to true, and then back to false on line 24

Copy link
Contributor Author

@dqii dqii Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's traversing a tree. When it hits an IndexScan node, it's going to "go inside it". I'm noting that we're inside an index scan now. If we hit the Lantern operator inside an index scan, that's an acceptable way to use the operator. We return false. If we hit the Lantern operator and we are not inside an index scan, that is not an acceptable way to use the operator. The if(list_member_oid(context->oidList, opExpr->opno) && !context->isIndexScan) { will catch this, and return true.

Once we have finished traversing an IndexScan, we note that we are no longer inside an IndexScan.

#endif
}

List *oidList = get_operator_oids();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oidList can be NULL here right?
If we account for that in function calls below, maybe leave a comment here to make it clear we did not forget it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Will add ldb assert after that diff is pushed.

plan_row RECORD;
found boolean := false;
BEGIN
FOR plan_row IN EXECUTE 'EXPLAIN ' || query_text LOOP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function would be clearer if EXPLAIN was passed with the query at callsite.
Then, it would be clear this function just inspects the output of the EXPLAIN query.

@@ -36,11 +46,28 @@ WITH neighbors AS (
RESET client_min_messages;

-- Verify where condition works properly and still uses index
SELECT * FROM small_world WHERE b IS TRUE ORDER BY v <-> '{0,0,0}';
EXPLAIN (COSTS FALSE) SELECT * FROM small_world WHERE b IS TRUE ORDER BY v <-> '{0,0,0}';
SELECT has_index_scan('SELECT * FROM small_world WHERE b IS TRUE ORDER BY v <-> ''{0,0,0}''');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are multiple single quotes in ''{0,0,0}'' intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise, it would match with the single quote that starts the string.


bool plan_tree_walker(Plan *plan, bool (*walker_func)(Plan *plan, void *context), void *context)
{
check_stack_depth();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor Author

@dqii dqii Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in Postgres's expression_tree_walker function, so I copied it. It checks that we aren't going into too deep a recursion. So would be triggered for an unreasonable complex query, or if there's somehow an infinite loop.

@github-actions
Copy link

github-actions bot commented Sep 10, 2023

Benchmarks

metric old new pct change
recall (after create) - 0.740 -
recall (after insert) - 0.784 -
select tps - 3292.904 -
select latency (ms) - 1.666 -
create latency (ms) - 1731.984 -
insert latency (ms) - 2.014 -
insert tps - 496.256 -
disk usage (bytes) - 6348800.000 -

@dqii dqii merged commit fe37327 into main Sep 10, 2023
@dqii dqii deleted the @di/ast-2 branch September 10, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants