-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashedRunFinder
run-finding improvements
#4974
Conversation
Here is the test code to generate those results:
|
engine/table/src/main/java/io/deephaven/engine/table/impl/by/HashedRunFinder.java
Outdated
Show resolved
Hide resolved
I think you should run with all variations of these configuration settings and see where we're at:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numbers indicate it makes things pretty much universally better.
Ryan's suggestion of an improved comment makes sense.
Did you consider a version that mixes the hash a bit more? It might reduce locality and get worse in the general case.
hashSlot = (hashSlot + 1) & context.tableMask; | ||
if (probe == 0) { | ||
// double hashing | ||
probe = 1 + (outputPosition % (context.tableSize - 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this probe sequence can never result in an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, similar to Trove we force tableSize
to be a prime number and our h2()
function returns a number in the range 1 <= n < tableSize
which guarantees the second hash has no common factors other than 1 (i.e. relatively prime). With no common factors, every probe sequence will reach every cell in the probe sequence. Also, since we size the table to have a load factor of 0.75
we can be certain that there will always be an empty cell to find.
I was initially worried about small desired capacities (0,1,2) but have verified that PrimeFinder#nextPrime()
returns 3 as the smallest possible prime (i.e. when provided 0 <= x <= 3
). so the small cases are covered properly as well.
I compared a few strategies for run-finding:
Here are the results:
