-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport sha256 optimizations (sse41, avx2, shani) #3133
Conversation
…ble-asm and enable by default 538cc0c build: Mention use of asm in summary (Wladimir J. van der Laan) ce5381e build: Rename --enable-experimental-asm to --enable-asm and enable by default (Wladimir J. van der Laan) Pull request description: Now that 0.15 is branched off, enable assembler SHA256 optimizations by default, but still allow disabling them, for example if something goes wrong with auto-detection on a platform. Also add mention of the use of asm in the configure summary. Tree-SHA512: cd20c497f65edd6b1e8b2cc3dfe82be11fcf4777543c830ccdec6c10f25eab4576b0f2953f3957736d7e04deaa4efca777aa84b12bb1cecb40c258e86c120ec8
…th SSE4.1 and AVX2 4defdfa [MOVEONLY] Move unused Merkle branch code to tests (Pieter Wuille) 4437d6e 8-way AVX2 implementation for double SHA256 on 64-byte inputs (Pieter Wuille) 230294b 4-way SSE4.1 implementation for double SHA256 on 64-byte inputs (Pieter Wuille) 1f0e7ca Use SHA256D64 in Merkle root computation (Pieter Wuille) d0c9632 Specialized double sha256 for 64 byte inputs (Pieter Wuille) 57f3463 Refactor SHA256 code (Pieter Wuille) 0df0178 Benchmark Merkle root computation (Pieter Wuille) Pull request description: This introduces a framework for specialized double-SHA256 with 64 byte inputs. 4 different implementations are provided: * Generic C++ (reusing the normal SHA256 code) * Specialized C++ for 64-byte inputs, but no special instructions * 4-way using SSE4.1 intrinsics * 8-way using AVX2 intrinsics On my own system (AVX2 capable), I get these benchmarks for computing the Merkle root of 9001 leaves (supported lengths / special instructions / parallellism): * 7.2 ms with varsize/naive/1way (master, non-SSE4 hardware) * 5.8 ms with size64/naive/1way (this PR, non-SSE4 capable systems) * 4.8 ms with varsize/SSE4/1way (master, SSE4 hardware) * 2.9 ms with size64/SSE4/4way (this PR, SSE4 hardware) * 1.1 ms with size64/AVX2/8way (this PR, AVX2 hardware) Tree-SHA512: efa32d48b32820d9ce788ead4eb583949265be8c2e5f538c94bc914e92d131a57f8c1ee26c6f998e81fb0e30675d4e2eddc3360bcf632676249036018cff343e
f68049d crypto: cleanup sha256 build (Cory Fields) Pull request description: Requested by @sipa in bitcoin#13386. Rather than appending all possible cpu variants to all targets, create a convenience variable that encompasses all. Tree-SHA512: 8e9ab2185515672b79bb7925afa4f3fbfe921bfcbe61456833d15457de4feba95290de17514344ce42ee81cc38b252476cd0c29432ac48c737c2225ed515a4bd
1e1eb63 Improve coverage of SHA256 SelfTest code (Pieter Wuille) Pull request description: The existing SelfTest code does not cover the specialized double-SHA256-for-64-byte-inputs transforms added in bitcoin#13191. Fix this. Tree-SHA512: 593c7ee5dc9e77fc4c89e0a7753a63529b0d3d32ddbc015ae3895b52be77bee8a80bf16b754b30a22c01625a68db83fb77fa945a543143542bebb5b0f017ec5b
… support 32d153f For AVX2 code, also check for AVX, XSAVE, and OS support (Pieter Wuille) Pull request description: Fixes bitcoin#12903. Tree-SHA512: 01e71efb5d3a43c49a145a5b1dc4fe7d0a491e1e78479e7df830a2aaac57c3dcfc316e28984c695206c76f93b68e4350fc037ca36756ca579b7070e39c835da2
57ba401 Enable double-SHA256-for-64-byte code on 32-bit x86 (Pieter Wuille) Pull request description: The SSE4 and AVX2 double-SHA256-for-64-byte input code from bitcoin#13191 compiles fine on 32-bit x86 systems, but the autodetection logic in sha256.cpp doesn't enable it. Fix this. Note that these instruction sets are only available on CPUs that support 64-bit mode as well, so it is only beneficial in the (perhaps unlikely) scenario where a 64-bit CPU is running a 32-bit Bitcoin Core binary. Tree-SHA512: 39d5963c1ba8c33932549d5fe98bd184932689a40aeba95043eca31dd6824f566197c546b60905555eccaf407408a5f0f200247bb0907450d309b0a70b245102
…ions 66b2cf1 Use immintrin.h everywhere for intrinsics (Pieter Wuille) 4c935e2 Add SHA256 implementation using using Intel SHA intrinsics (Pieter Wuille) 268400d [Refactor] CPU feature detection logic for SHA256 (Pieter Wuille) Pull request description: Based on bitcoin#13191. This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4. In addition to bitcoin#13191, two extra implementations are provided: * (a) A variable-length SHA256 implementation using SHA extensions. * (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions. Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system: * Using generic C++ code (pre-bitcoin#10821): 6.1ms * Using SSE4 (master, bitcoin#10821): 4.6ms * Using 4-way SSE4 specialized for 64-byte inputs (bitcoin#13191): 2.8ms * Using 8-way AVX2 specialized for 64-byte inputs (bitcoin#13191): 2.1ms * Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms Benchmarks for 32-byte SHA256 on the same system: * Using SSE4 (master, bitcoin#10821): 190ns * Using SHA-NI (this PR): 53ns Benchmarks for 1000000-byte SHA256 on the same system: * Using SSE4 (master, bitcoin#10821): 2.5ms * Using SHA-NI (this PR): 0.51ms Tree-SHA512: 2b319e33b22579f815d91f9daf7994a5e1e799c4f73c13e15070dd54ba71f3f6438ccf77ae9cbd1ce76f972d9cbeb5f0edfea3d86f101bbc1055db70e42743b7
4207c1b configure: Initialise assembly enable_* variables (Luke Dashjr) afe0875 configure: Skip assembly support checks, when assembly is disabled (Luke Dashjr) d8ab8dc configure: Invert --enable-asm help string since default is now enabled (Luke Dashjr) Pull request description: Fixes bitcoin#13759 Also inverts the help (so it shows `--disable-asm` like other enabled-by-default options, and initialises the flag variables. ACKs for commit 4207c1: laanwj: makes sense, utACK 4207c1b achow101: utACK 4207c1b ken2812221: ACK 4207c1b practicalswift: tACK 4207c1b Tree-SHA512: a30be1008fd8f019db34073f78e90a3c4ad3767d88d7c20ebb83e99c7abc23552f7da3ac8bd20f727405799aff1ecb6044cf869653f8db70478a074d0b877e0a
…tian build fail. 63c16ed Use __cpuid_count for gnu C to avoid gitian build fail. (Chun Kuan Lee) Pull request description: Fixes bitcoin#13538 Tree-SHA512: 161ae4db022288ae8631a166eaea2d08cf2c90bcd27218a094a754276de30b92ca9cfb5a79aa899c5a9d0534c5d7261037e7e915e1b92bc7067ab1539dc2b51e
364cb07
to
a1bd147
Compare
Tests got a little bit faster on Travis: Looking forward to see the numbers in combination with #3132 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
This PR backports all optimizations, improvements and fixes for sha256 related code.
It also makes using assembly implementations the default (
--enable-experimental-asm
is now--disable-asm
and thus ASM is enabled by default).To verify that I didn't miss any backports, you can run
git checkout bitcoin/master -- src/crypto/sha256*
(assuming you havebitcoin
as remote for the bitcoin repo) and then run a diff.I will later benchmark inital sync and tests.