You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: src/backend/access/nbtree/README
+44
Original file line number
Diff line number
Diff line change
@@ -1054,3 +1054,47 @@ item is irrelevant, and need not be stored at all. This arrangement
1054
1054
corresponds to the fact that an L&Y non-leaf page has one more pointer
1055
1055
than key. Suffix truncation's negative infinity attributes behave in
1056
1056
the same way.
1057
+
1058
+
Notes About Index Scan Prefetch
1059
+
-------------------------------
1060
+
1061
+
Prefetch can significantly improve the speed of OLAP queries.
1062
+
To be able to perform prefetch, we need to know which pages will
1063
+
be accessed during the scan. It is trivial for heap- and bitmap scans,
1064
+
but requires more effort for index scans: to implement prefetch for
1065
+
index scans, we need to find out subsequent leaf pages.
1066
+
1067
+
Postgres links all pages at the same level of the B-Tree in a doubly linked list and uses this list for
1068
+
forward and backward iteration. This list, however, can not trivially be used for prefetching because to locate the next page because we need first to load the current page. To prefetch more than only the next page, we can utilize the parent page's downlinks instead, as it contains references to most of the target page's sibling pages.
1069
+
1070
+
Because Postgres' nbtree pages have no reference to their parent page, we need to remember the parent page when descending the btree and use it to prefetch subsequent pages. We will utilize the parent's linked list to improve the performance of this prefetch system past the key range of the parent page.
1071
+
1072
+
We should prefetch not only leaf pages, but also the next parent page.
1073
+
The trick is to correctly calculate the moment when it will be needed:
1074
+
We should not issue the prefetch request when prefetch requests for all children from the current parent page have already been issued, but when there are only effective_io_concurrency line pointers left to prefetch from the page.
1075
+
1076
+
Currently there are two different prefetch implementations for
1077
+
index-only scan and index scan. Index-only scan doesn't need to access heap tuples so it prefetches
1078
+
only B-Tree leave pages (and their parents). Prefetch of index-only scan is performed only
1079
+
if parallel plan is not used. Parallel index scan is using critical section for obtaining next
1080
+
page by parallel worker. Leaf page is loaded in this critical section.
1081
+
And if most of time is spent in loading the page, then it actually eliminates any concurrency
1082
+
and makes prefetch useless. For relatively small tables Postgres will not choose parallel plan in
1083
+
any case. And for large tables it can be enforced by setting max_parallel_workers_per_gather=0.
1084
+
1085
+
Prefetch for normal (not index-only) index tries to prefetch heap tuples
1086
+
referenced from leaf page. Average number of items per page
1087
+
is about 100 which is comparable with default value of effective_io_concurrency.
1088
+
So there is not so much sense trying to prefetch also next leaf page.
1089
+
1090
+
As far as it is difficult to estimate number of entries traversed by index scan,
1091
+
we prefer not to prefetch large number of pages from the very beginning.
1092
+
Such useless prefetch can reduce the performance of point lookups.
1093
+
Instead of it we start with smallest prefetch distance and increase it
1094
+
by INCREASE_PREFETCH_DISTANCE_STEP after processing each item
1095
+
until it reaches effective_io_concurrency. In case of index-only
1096
+
scan we increase prefetch distance after processing each leaf pages
1097
+
and for index scan - after processing each tuple.
1098
+
The only exception is case when no key bounds are specified.
1099
+
In this case we traverse the whole relation and it makes sense
1100
+
to start with the largest possible prefetch distance from the very beginning.
0 commit comments