-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: switch to a faster xxhash package #7362
Conversation
// Initialize seed for term hashing. This is intentionally placed before the | ||
// root document sets are constructed to ensure they use the same hash seed as | ||
// subsequent lookups. If the hash seeds are out of sync, lookups will fail. | ||
var hashSeed = rand.New(rand.NewSource(time.Now().UnixNano())) | ||
var hashSeed0 = (uint64(hashSeed.Uint32()) << 32) | uint64(hashSeed.Uint32()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that, but I'd rather have it added back for number than removed from here. Or why would we not include a seed in the hashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or why would we not include a seed in the hashes?
My guess is that the initial implementation used the SipHash package, which requires two seeds: https://pkg.go.dev/github.com/dchest/siphash#Hash
func Hash(k0, k1 uint64, b []byte) uint64
Since the xxHash package we use now doesn’t enforce a seed, I think we don’t need them anymore. Also,
var hashSeed = rand.New(rand.NewSource(time.Now().UnixNano()))
relies on time.Now().UnixNano()
, so it’s not a constant seed either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I haven't really dug into the details here, so somebody else will be a better judge here.
Thanks a lot for submitting this though! Seems like a good and relatively simple improvement to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the used xxhash (cespare) it's defaulting to hashseed = 0 if unset. https://github.com/cespare/xxhash/blob/main/xxhash.go#L39-L42
What are our hashes for? I don't think we'll need to worry about them being predictable, other than that some iteration order might become more stable than advertised... 🤔 @johanfylling do you see a need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
go test -v -benchmem -bench '^BenchmarkTermHashing$' -run='^$' -count=10 github.com/open-policy-agent/opa/v1/ast goos: linux goarch: amd64 pkg: github.com/open-policy-agent/opa/v1/ast cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ TermHashing/10-16 18.68n ± 2% 11.30n ± 0% -39.49% (p=0.000 n=10) TermHashing/100-16 42.94n ± 2% 33.71n ± 1% -21.48% (p=0.000 n=10) TermHashing/1000-16 179.4n ± 0% 165.1n ± 1% -7.97% (p=0.000 n=10) geomean 52.39n 39.77n -24.10% Signed-off-by: Eng Zer Jun <[email protected]>
/cc @srenatus and @johanfylling 😊 |
Why the changes in this PR are needed?
What are the changes in this PR?
Replace
github.com/OneOfOne/xxhash
withgithub.ghproxy.top/cespare/xxhash/v2
, a faster implementation of xxHash.Notes to assist PR review:
Further comments: