Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport sha256 optimizations (sse41, avx2, shani) #3133

Merged
merged 10 commits into from
Oct 3, 2019

Conversation

codablock
Copy link

This PR backports all optimizations, improvements and fixes for sha256 related code.

It also makes using assembly implementations the default (--enable-experimental-asm is now --disable-asm and thus ASM is enabled by default).

To verify that I didn't miss any backports, you can run git checkout bitcoin/master -- src/crypto/sha256* (assuming you have bitcoin as remote for the bitcoin repo) and then run a diff.

I will later benchmark inital sync and tests.

@codablock codablock added this to the 14.1 milestone Oct 1, 2019
laanwj and others added 10 commits October 1, 2019 23:20
…ble-asm and enable by default

538cc0c build: Mention use of asm in summary (Wladimir J. van der Laan)
ce5381e build: Rename --enable-experimental-asm to --enable-asm and enable by default (Wladimir J. van der Laan)

Pull request description:

  Now that 0.15 is branched off, enable assembler SHA256 optimizations by default, but still allow disabling them, for example if something goes wrong with auto-detection on a platform.

  Also add mention of the use of asm in the configure summary.

Tree-SHA512: cd20c497f65edd6b1e8b2cc3dfe82be11fcf4777543c830ccdec6c10f25eab4576b0f2953f3957736d7e04deaa4efca777aa84b12bb1cecb40c258e86c120ec8
…th SSE4.1 and AVX2

4defdfa [MOVEONLY] Move unused Merkle branch code to tests (Pieter Wuille)
4437d6e 8-way AVX2 implementation for double SHA256 on 64-byte inputs (Pieter Wuille)
230294b 4-way SSE4.1 implementation for double SHA256 on 64-byte inputs (Pieter Wuille)
1f0e7ca Use SHA256D64 in Merkle root computation (Pieter Wuille)
d0c9632 Specialized double sha256 for 64 byte inputs (Pieter Wuille)
57f3463 Refactor SHA256 code (Pieter Wuille)
0df0178 Benchmark Merkle root computation (Pieter Wuille)

Pull request description:

  This introduces a framework for specialized double-SHA256 with 64 byte inputs. 4 different implementations are provided:
  * Generic C++ (reusing the normal SHA256 code)
  * Specialized C++ for 64-byte inputs, but no special instructions
  * 4-way using SSE4.1 intrinsics
  * 8-way using AVX2 intrinsics

  On my own system (AVX2 capable), I get these benchmarks for computing the Merkle root of 9001 leaves (supported lengths / special instructions / parallellism):
  * 7.2 ms with varsize/naive/1way (master, non-SSE4 hardware)
  * 5.8 ms with size64/naive/1way (this PR, non-SSE4 capable systems)
  * 4.8 ms with varsize/SSE4/1way (master, SSE4 hardware)
  * 2.9 ms with size64/SSE4/4way (this PR, SSE4 hardware)
  * 1.1 ms with size64/AVX2/8way (this PR, AVX2 hardware)

Tree-SHA512: efa32d48b32820d9ce788ead4eb583949265be8c2e5f538c94bc914e92d131a57f8c1ee26c6f998e81fb0e30675d4e2eddc3360bcf632676249036018cff343e
f68049d crypto: cleanup sha256 build (Cory Fields)

Pull request description:

  Requested by @sipa in bitcoin#13386.

  Rather than appending all possible cpu variants to all targets, create a convenience variable that encompasses all.

Tree-SHA512: 8e9ab2185515672b79bb7925afa4f3fbfe921bfcbe61456833d15457de4feba95290de17514344ce42ee81cc38b252476cd0c29432ac48c737c2225ed515a4bd
1e1eb63 Improve coverage of SHA256 SelfTest code (Pieter Wuille)

Pull request description:

  The existing SelfTest code does not cover the specialized double-SHA256-for-64-byte-inputs transforms added in bitcoin#13191. Fix this.

Tree-SHA512: 593c7ee5dc9e77fc4c89e0a7753a63529b0d3d32ddbc015ae3895b52be77bee8a80bf16b754b30a22c01625a68db83fb77fa945a543143542bebb5b0f017ec5b
… support

32d153f For AVX2 code, also check for AVX, XSAVE, and OS support (Pieter Wuille)

Pull request description:

  Fixes bitcoin#12903.

Tree-SHA512: 01e71efb5d3a43c49a145a5b1dc4fe7d0a491e1e78479e7df830a2aaac57c3dcfc316e28984c695206c76f93b68e4350fc037ca36756ca579b7070e39c835da2
57ba401 Enable double-SHA256-for-64-byte code on 32-bit x86 (Pieter Wuille)

Pull request description:

  The SSE4 and AVX2 double-SHA256-for-64-byte input code from bitcoin#13191 compiles fine on 32-bit x86 systems, but the autodetection logic in sha256.cpp doesn't enable it. Fix this.

  Note that these instruction sets are only available on CPUs that support 64-bit mode as well, so it is only beneficial in the (perhaps unlikely) scenario where a 64-bit CPU is running a 32-bit Bitcoin Core binary.

Tree-SHA512: 39d5963c1ba8c33932549d5fe98bd184932689a40aeba95043eca31dd6824f566197c546b60905555eccaf407408a5f0f200247bb0907450d309b0a70b245102
…ions

66b2cf1 Use immintrin.h everywhere for intrinsics (Pieter Wuille)
4c935e2 Add SHA256 implementation using using Intel SHA intrinsics (Pieter Wuille)
268400d [Refactor] CPU feature detection logic for SHA256 (Pieter Wuille)

Pull request description:

  Based on bitcoin#13191.

  This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.

  In addition to bitcoin#13191, two extra implementations are provided:
  * (a) A variable-length SHA256 implementation using SHA extensions.
  * (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.

  Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:
  * Using generic C++ code (pre-bitcoin#10821): 6.1ms
  * Using SSE4 (master, bitcoin#10821): 4.6ms
  * Using 4-way SSE4 specialized for 64-byte inputs (bitcoin#13191): 2.8ms
  * Using 8-way AVX2 specialized for 64-byte inputs (bitcoin#13191): 2.1ms
  * Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms

  Benchmarks for 32-byte SHA256 on the same system:
  * Using SSE4 (master, bitcoin#10821): 190ns
  * Using SHA-NI (this PR): 53ns

  Benchmarks for 1000000-byte SHA256 on the same system:
  * Using SSE4 (master, bitcoin#10821): 2.5ms
  * Using SHA-NI (this PR): 0.51ms

Tree-SHA512: 2b319e33b22579f815d91f9daf7994a5e1e799c4f73c13e15070dd54ba71f3f6438ccf77ae9cbd1ce76f972d9cbeb5f0edfea3d86f101bbc1055db70e42743b7
4207c1b configure: Initialise assembly enable_* variables (Luke Dashjr)
afe0875 configure: Skip assembly support checks, when assembly is disabled (Luke Dashjr)
d8ab8dc configure: Invert --enable-asm help string since default is now enabled (Luke Dashjr)

Pull request description:

  Fixes bitcoin#13759

  Also inverts the help (so it shows `--disable-asm` like other enabled-by-default options, and initialises the flag variables.

ACKs for commit 4207c1:
  laanwj:
    makes sense, utACK 4207c1b
  achow101:
    utACK 4207c1b
  ken2812221:
    ACK 4207c1b
  practicalswift:
    tACK 4207c1b

Tree-SHA512: a30be1008fd8f019db34073f78e90a3c4ad3767d88d7c20ebb83e99c7abc23552f7da3ac8bd20f727405799aff1ecb6044cf869653f8db70478a074d0b877e0a
…tian build fail.

63c16ed Use __cpuid_count for gnu C to avoid gitian build fail. (Chun Kuan Lee)

Pull request description:

  Fixes bitcoin#13538

Tree-SHA512: 161ae4db022288ae8631a166eaea2d08cf2c90bcd27218a094a754276de30b92ca9cfb5a79aa899c5a9d0534c5d7261037e7e915e1b92bc7067ab1539dc2b51e
@codablock codablock force-pushed the pr_backport_sha256_stuff branch from 364cb07 to a1bd147 Compare October 1, 2019 21:20
@codablock
Copy link
Author

Tests got a little bit faster on Travis:
develop branch: Runtime: 827 s
This branch: Runtime: 754 s

Looking forward to see the numbers in combination with #3132 :)

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Copy link

@nmarley nmarley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@UdjinM6 UdjinM6 merged commit 33a9f46 into dashpay:develop Oct 3, 2019
@codablock codablock deleted the pr_backport_sha256_stuff branch October 3, 2019 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants