Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport prevector optimizations #3132

Merged
merged 3 commits into from
Oct 3, 2019

Conversation

codablock
Copy link

While debugging and profiling tests I've seen that in some tests (e.g. p2p-fullblocktest.py), 50% of the time can be spent in prevector initialization and copying. These backported PRs should reduce this to near zero time spent.

@codablock codablock added this to the 14.1 milestone Oct 1, 2019
@codablock codablock force-pushed the pr_backport_prevector_stuff branch from 8abb590 to 5798eb6 Compare October 1, 2019 21:20
UdjinM6
UdjinM6 previously approved these changes Oct 2, 2019
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Seeing like 15-20x in prevector benchmarks 👍

@codablock
Copy link
Author

Tests also got a little bit faster on Travis:
develop branch: Runtime: 827 s
This branch: Runtime: 766 s

laanwj and others added 3 commits October 2, 2019 15:25
…rations much faster

5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke)
e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura)
f0e7aa7 Add new prevector benchmarks. (Evan Klitzke)

Pull request description:

  This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg

  This branch improves things significantly. For trivial types, the new results for the prevector benchmark are:

   * `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster
   * `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster
   * `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster

  Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above.

  The pull request here consists of a series of three commits:
   * The first adds new benchmarks but does not change the prevector code.
   * The second is from @AkioNak , and merges some prevector optimizations he submitted in bitcoin#11988
   * The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types

Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
Until the necessary backports for the benchmark system are backported.
@codablock
Copy link
Author

Rebased on develop to get Travis green

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-utACK

Copy link

@nmarley nmarley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@UdjinM6 UdjinM6 merged commit 52ded45 into dashpay:develop Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants