Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: switch to a faster xxhash package #7362

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

Juneezee
Copy link
Contributor

Why the changes in this PR are needed?

go test -v -benchmem -bench '^BenchmarkTermHashing$' -run='^$' -count=10 github.com/open-policy-agent/opa/v1/ast

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/v1/ast
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
                    │   old.txt   │               new.txt               │
                    │   sec/op    │   sec/op     vs base                │
TermHashing/10-16     18.68n ± 2%   11.30n ± 0%  -39.49% (p=0.000 n=10)
TermHashing/100-16    42.94n ± 2%   33.71n ± 1%  -21.48% (p=0.000 n=10)
TermHashing/1000-16   179.4n ± 0%   165.1n ± 1%   -7.97% (p=0.000 n=10)
geomean               52.39n        39.77n       -24.10%

What are the changes in this PR?

Replace github.com/OneOfOne/xxhash with github.com/cespare/xxhash/v2, a faster implementation of xxHash.

Notes to assist PR review:

Further comments:

Comment on lines -20 to -25
// Initialize seed for term hashing. This is intentionally placed before the
// root document sets are constructed to ensure they use the same hash seed as
// subsequent lookups. If the hash seeds are out of sync, lookups will fail.
var hashSeed = rand.New(rand.NewSource(time.Now().UnixNano()))
var hashSeed0 = (uint64(hashSeed.Uint32()) << 32) | uint64(hashSeed.Uint32())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running git blame, this looks like leftover code to me.

7 years ago, we switched from SipHash to xxHash in PR #970. I'm not entirely sure why the author of that PR removed the usage of the seed for (Number).Hash but kept it for (String).Hash and (Var).Hash in commit 5cb4d8f.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that, but I'd rather have it added back for number than removed from here. Or why would we not include a seed in the hashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or why would we not include a seed in the hashes?

My guess is that the initial implementation used the SipHash package, which requires two seeds: https://pkg.go.dev/github.com/dchest/siphash#Hash

func Hash(k0, k1 uint64, b []byte) uint64

Since the xxHash package we use now doesn’t enforce a seed, I think we don’t need them anymore. Also,

var hashSeed = rand.New(rand.NewSource(time.Now().UnixNano()))

relies on time.Now().UnixNano(), so it’s not a constant seed either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I haven't really dug into the details here, so somebody else will be a better judge here.

Thanks a lot for submitting this though! Seems like a good and relatively simple improvement to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the used xxhash (cespare) it's defaulting to hashseed = 0 if unset. https://github.com/cespare/xxhash/blob/main/xxhash.go#L39-L42

What are our hashes for? I don't think we'll need to worry about them being predictable, other than that some iteration order might become more stable than advertised... 🤔 @johanfylling do you see a need for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

go test -v -benchmem -bench '^BenchmarkTermHashing$' -run='^$' -count=10 github.com/open-policy-agent/opa/v1/ast

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/v1/ast
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
                    │   old.txt   │               new.txt               │
                    │   sec/op    │   sec/op     vs base                │
TermHashing/10-16     18.68n ± 2%   11.30n ± 0%  -39.49% (p=0.000 n=10)
TermHashing/100-16    42.94n ± 2%   33.71n ± 1%  -21.48% (p=0.000 n=10)
TermHashing/1000-16   179.4n ± 0%   165.1n ± 1%   -7.97% (p=0.000 n=10)
geomean               52.39n        39.77n       -24.10%

Signed-off-by: Eng Zer Jun <[email protected]>
@Juneezee
Copy link
Contributor Author

/cc @srenatus and @johanfylling 😊

@srenatus srenatus merged commit 022ca9a into open-policy-agent:main Feb 19, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants