This is a proof-of-concept binary search implementation that entirely avoids branching,
i.e. if-then-else
. It accomplishes this with a mix of bit twiddling,
jump tables, and access to the x86 BSR
instruction through a compiler builtin.
Branch mispredictions can be expensive, especially on architectures with long pipelines. This approach avoids branching entirely, at the expense of code size and more overall instructions.
Additionally, because the same code is run regardless of input, this approach can be extended to have SIMD/vector support, parallelizing searches across the same (or different) data.
At a high level, the search works as follows:
- compute
log2(length)
switch
on the result- Taking advantage of case fall-through, run a modified binary search update
log2(length)
times. - Interpret the values of the search window to get the result
This is an optimization. The can work correctly by running log2(sizeof(length))
(e.g. 32, 64) times.
A standard binary search implicitly computes this empirically, essentially
dividing length
by 2 until it's 0. Because this implementation uses a jump
table to avoid a loop condition, it needs to be known before the search.
unsigned int bits = 64 - __builtin_clzl(size);
This is functionally the same as the loop in an iterative binary search
implementation—it runs update_search_window
log2(length)
times.
Without bits % 64
, the compiler adds a check to make sure there's a case
for bits
.
switch (bits % 64) {
case 63: update_search_window(array, &start, &end, key);
case 62: update_search_window(array, &start, &end, key);
case 61: update_search_window(array, &start, &end, key);
...
For the most part, the update function looks similar to a standard binary search, but instead of returning when the result is found, the window remains the same allowing the function to be run again.
The second difference is more in implementation than function. Rather than
using an if-then-else
construct to updated start
or end
, both are
updated regardless, with bit masks used to select the next value.
__attribute__((always_inline)) static inline void update_search_window(
const int32_t *array, ptrdiff_t *start, ptrdiff_t *end, const int32_t key) {
ptrdiff_t median = (*start + *end) / 2;
int32_t value = array[median];
uintptr_t start_mask = -(key < value);
uintptr_t end_mask = -(key > value);
*start = (median & ~start_mask) | (*start & start_mask);
*end = ((median & ~end_mask) | (*end & end_mask)) +
(~start_mask & ~end_mask & 1);
}
Like a recursive binary search's base case, the result is determined by
comparing the key
being looked up the current window. Like in the window
update function, bitmasks are used to avoid branching.
Unlike bsearch()
, but like Java's Arrays.binarySearch()
,
this returns -insertion_point - 1
when key
isn't found.
int is_match_mask = -(array[start] == key);
int is_after_start = -(array[start] < key);
int is_before_end = -(array[end] > key);
return (is_empty_mask & -1) | (is_non_empty_mask & (
(is_match_mask & start) | (~is_match_mask & (
(is_after_start & (-end - 1)) |
(~is_after_start & (is_before_end & (-start - 1)))))));