Backport prevector optimizations #3132

codablock · 2019-10-01T12:30:24Z

While debugging and profiling tests I've seen that in some tests (e.g. p2p-fullblocktest.py), 50% of the time can be spent in prevector initialization and copying. These backported PRs should reduce this to near zero time spent.

UdjinM6

utACK

Seeing like 15-20x in prevector benchmarks 👍

codablock · 2019-10-02T09:27:49Z

Tests also got a little bit faster on Travis:
develop branch: Runtime: 827 s
This branch: Runtime: 766 s

@AkioNak

…rations much faster 5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke) e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura) f0e7aa7 Add new prevector benchmarks. (Evan Klitzke) Pull request description: This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg This branch improves things significantly. For trivial types, the new results for the prevector benchmark are: * `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster * `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster * `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above. The pull request here consists of a series of three commits: * The first adds new benchmarks but does not change the prevector code. * The second is from @AkioNak , and merges some prevector optimizations he submitted in bitcoin#11988 * The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368

86b47fa speed up Unserialize_impl for prevector (Akio Nakamura) Pull request description: The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times. However, `reserve()` does not change the value of `_size` (a private member of prevector). This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`. The changes are as follows: 1. prevector.h Add a public member function named 'append'. This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values. 2. serialize.h In the following two function: - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)` - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)` Make a callback function from each original logic of reading values from stream, and call prevector's `append()`. 3. test/prevector_tests.cpp Add a test for `append()`. ## A benchmark result is following: [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD) [result] DeserializeAndCheckBlockTest => 22% faster DeserializeBlockTest => 29% faster [before PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339 DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187 [After PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757 DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527 ACKs for top commit: laanwj: utACK 86b47fa Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902

Until the necessary backports for the benchmark system are backported.

codablock · 2019-10-02T13:25:53Z

Rebased on develop to get Travis green

UdjinM6

re-utACK

nmarley

utACK

codablock added this to the 14.1 milestone Oct 1, 2019

codablock force-pushed the pr_backport_prevector_stuff branch from 8abb590 to 5798eb6 Compare October 1, 2019 21:20

UdjinM6 previously approved these changes Oct 2, 2019

View reviewed changes

codablock mentioned this pull request Oct 2, 2019

Backport sha256 optimizations (sse41, avx2, shani) #3133

Merged

laanwj and others added 3 commits October 2, 2019 15:25

Temporarily remove arguments to BENCHMARK

bdfc303

Until the necessary backports for the benchmark system are backported.

codablock dismissed UdjinM6’s stale review via bdfc303 October 2, 2019 13:25

codablock force-pushed the pr_backport_prevector_stuff branch from 5798eb6 to bdfc303 Compare October 2, 2019 13:25

UdjinM6 approved these changes Oct 2, 2019

View reviewed changes

nmarley approved these changes Oct 2, 2019

View reviewed changes

UdjinM6 merged commit 52ded45 into dashpay:develop Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport prevector optimizations #3132

Backport prevector optimizations #3132

codablock commented Oct 1, 2019

UdjinM6 left a comment

codablock commented Oct 2, 2019

codablock commented Oct 2, 2019

UdjinM6 left a comment

nmarley left a comment

Backport prevector optimizations #3132

Backport prevector optimizations #3132

Conversation

codablock commented Oct 1, 2019

UdjinM6 left a comment

Choose a reason for hiding this comment

codablock commented Oct 2, 2019

codablock commented Oct 2, 2019

UdjinM6 left a comment

Choose a reason for hiding this comment

nmarley left a comment

Choose a reason for hiding this comment