Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multi-threaded runtime instead of single-threaded ones #6743

Merged

Conversation

calinanca99
Copy link
Contributor

Problem

Addresses #6648.

Summary of changes

  • Use a multi-threaded runtime in add_multithreaded_walredo_requesters that sets the number of workers based on the number of threads passed by Criterion
  • Spawn Tokio tasks to the runtime instead of spawning threads
  • Replace the Drop implementation for JoinOnDrop with a join_all method. This is needed to avoid having async code in the Drop trait implementation
  • Make execute_all and Request::execute async

Results

Result compared to main:

short/short/1           time:   [14.172 µs 14.249 µs 14.341 µs]
                        change: [-47.358% -47.056% -46.671%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
short/short/2           time:   [32.583 µs 32.908 µs 33.243 µs]
                        change: [-51.421% -50.729% -50.059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
short/short/4           time:   [55.863 µs 56.366 µs 56.904 µs]
                        change: [-47.335% -46.505% -45.649%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
short/short/8           time:   [119.06 µs 120.04 µs 121.14 µs]
                        change: [-38.849% -38.203% -37.459%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
short/short/16          time:   [255.74 µs 256.95 µs 258.30 µs]
                        change: [-33.718% -33.307% -32.862%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

medium/medium/1         time:   [145.99 µs 146.89 µs 147.87 µs]
                        change: [-10.211% -9.5396% -8.8795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
medium/medium/2         time:   [314.33 µs 316.52 µs 318.98 µs]
                        change: [-4.7189% -3.9892% -3.2384%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
medium/medium/4         time:   [594.20 µs 596.92 µs 599.64 µs]
                        change: [-3.4526% -2.8332% -2.2815%] (p = 0.00 < 0.05)
                        Performance has improved.
medium/medium/8         time:   [1.1355 ms 1.1409 ms 1.1469 ms]
                        change: [-5.5254% -4.8121% -4.0649%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
medium/medium/16        time:   [2.3329 ms 2.3450 ms 2.3580 ms]
                        change: [-1.1655% -0.4665% +0.2273%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@calinanca99 calinanca99 marked this pull request as ready for review February 13, 2024 16:29
@calinanca99 calinanca99 requested a review from a team as a code owner February 13, 2024 16:29
@calinanca99 calinanca99 requested review from arpad-m and removed request for a team February 13, 2024 16:29
@jcsp
Copy link
Contributor

jcsp commented Feb 13, 2024

CC @problame as #6648 was one of yours

problame added a commit that referenced this pull request Feb 14, 2024
- fix deadlocks caused by remaining sync primitive
- increase thread count
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing to Neon!

I tried this locally but found that it deadlocks.

Also, the intent was that there be a fixed number of worker threads on which a variable number of requesting thread run on.

Perhaps that wasn't clear in the issue description.

While trying to understand the deadlock, I went ahead and fixed these issues.

PR stacked atop this one: #6761

We'll review it and either close this PR or merge it in here.

@problame problame self-assigned this Feb 14, 2024
@calinanca99
Copy link
Contributor Author

Thanks for reviewing!

Also, the intent was that there be a fixed number of worker threads on which a variable number of requesting thread run on.

Ah, I see! Indeed, I was initially a bit confused about this part. 😅 The comment in #6761 about stress-testing clarified it for me.

I tried this locally but found that it deadlocks.

I haven't ran into deadlocks when running it, but taking another look at the code it makes sense. :D I supposed that they were caused by the sync barrier.wait() inside the async block?

@problame
Copy link
Contributor

supposed that they were caused by the sync barrier.wait() inside the async block?

Yes

problame added a commit that referenced this pull request Feb 15, 2024
This PR finishes the work started by https://github.com/calinanca99 in
#6743

- always use default worker thread count & explain rationale behind that
- fix deadlocks caused by remaining sync primitive
- increase thread counts
…eondatabase#6761)

This PR finishes the work started by https://github.com/calinanca99 in
neondatabase#6743

- always use default worker thread count & explain rationale behind that
- fix deadlocks caused by remaining sync primitive
- increase thread counts
@problame problame added the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 15, 2024
@github-actions github-actions bot removed the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 15, 2024
@vipvap vipvap mentioned this pull request Feb 15, 2024
Copy link

github-actions bot commented Feb 15, 2024

2436 tests run: 2318 passed, 0 failed, 118 skipped (full report)


Flaky tests (6)

Postgres 16

  • test_create_snapshot: debug
  • test_vm_bit_clear_on_heap_lock: debug

Postgres 15

Postgres 14

  • test_pageserver_init_node_id: debug

Code coverage (full report)

  • functions: 55.9% (12893 of 23056 functions)
  • lines: 82.5% (69744 of 84539 lines)

The comment gets automatically updated with the latest test results
c0847de at 2024-02-16T15:28:31.162Z :recycle:

@bayandin bayandin added the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@github-actions github-actions bot removed the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
- re-introduce warm-up
- joinset
- capture reference data
@problame problame added the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@github-actions github-actions bot removed the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@problame problame dismissed their stale review February 16, 2024 11:45

authored myself

@problame problame added the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@github-actions github-actions bot removed the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@bayandin bayandin added the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@github-actions github-actions bot removed the approved-for-ci-run Changes are safe to trigger CI for the PR label Feb 16, 2024
@problame problame merged commit 36e1100 into neondatabase:main Feb 16, 2024
131 of 133 checks passed
@calinanca99 calinanca99 deleted the make-walredo-bench-multithreaded branch February 18, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants