-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misson Control Store: Improve performance #8549
Misson Control Store: Improve performance #8549
Conversation
Important Review skippedAuto reviews are limited to specific labels. Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
6bf04bf
to
38ba47d
Compare
Going through the lint/CI failures now, might need some refactoring to deal with the type checks required from using non-generic *list.List |
ab6590f
to
3ee9bfa
Compare
Lint issues are fixed, I believe the other CI failures are just flakes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Did a first round, have a couple of questions.
go.mod
Outdated
@@ -4,6 +4,7 @@ require ( | |||
github.com/NebulousLabs/go-upnp v0.0.0-20180202185039-29b680b06c82 | |||
github.com/Yawning/aez v0.0.0-20211027044916-e49e68abd344 | |||
github.com/andybalholm/brotli v1.0.3 | |||
github.com/bahlo/generic-list-go v0.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll want to pull in an external library for this but instead add the same functionality to the new fn
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, though I should note that this specific dependency is very small (the generic list impl is the only thing in it), it's a drop-in replacement, it's based on stdlib code, it's complete, fully tested has no open issues and hasn't had a need for any new code in over 2 years, so it's about as good a dependency as anyone could hope for :p
To bring this into the fn package, I'd just copy the subset of the code that was used here over (code is BSD licensed). Would that be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to inline this library (just list.go
and list_test.go
and perhaps LICENSE
?) into the fn
package. But happy to hear other opinions, cc @Roasbeef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, perhaps the change is also not necessary, the improvement is only noticeable for 250 results:
│ with_type_assertions │ after │
│ sec/op │ sec/op vs base │
MissionControlStoreFlushing/no_additional_results-4 216.3n ± 2% 218.7n ± 8% ~ (p=0.190 n=10)
MissionControlStoreFlushing/one_additional_result-4 29.32µ ± 0% 29.19µ ± 12% ~ (p=0.645 n=10)
MissionControlStoreFlushing/ten_additional_results-4 64.87µ ± 0% 63.99µ ± 0% -1.37% (p=0.000 n=10)
MissionControlStoreFlushing/100_additional_results-4 385.4µ ± 6% 370.6µ ± 0% -3.84% (p=0.000 n=10)
MissionControlStoreFlushing/250_additional_results-4 1016.6µ ± 2% 908.8µ ± 1% -10.60% (p=0.000 n=10)
MissionControlStoreFlushing/500_additional_results-4 1.945m ± 3% 1.780m ± 1% -8.49% (p=0.000 n=10)
geomean 82.42µ 79.10µ -4.03%
I'd hope that this package would make it into the standard library at some point :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What were the changes you did on this benchmark? Because the no_additional_results
case is significantly different from the original benchmark in the PR:
MissionControlStoreFlushing/no_additional_results-7 221µs ±15% 0µs ± 2% -99.86% (p=0.000 n=10+10)
3ee9bfa
to
169f934
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements in relative terms. In absolute terms, on my machine this reduces execution time from 30 ms to 30 µs for adding a single result to 100000 preexisting results (a value some people use if they want a more complete picture of the graph). I'm still thinking about the added de-duplication code complexity, perhaps you could give more motivation for the change (maybe benchmarks on weaker hardware)?
go.mod
Outdated
@@ -4,6 +4,7 @@ require ( | |||
github.com/NebulousLabs/go-upnp v0.0.0-20180202185039-29b680b06c82 | |||
github.com/Yawning/aez v0.0.0-20211027044916-e49e68abd344 | |||
github.com/andybalholm/brotli v1.0.3 | |||
github.com/bahlo/generic-list-go v0.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, perhaps the change is also not necessary, the improvement is only noticeable for 250 results:
│ with_type_assertions │ after │
│ sec/op │ sec/op vs base │
MissionControlStoreFlushing/no_additional_results-4 216.3n ± 2% 218.7n ± 8% ~ (p=0.190 n=10)
MissionControlStoreFlushing/one_additional_result-4 29.32µ ± 0% 29.19µ ± 12% ~ (p=0.645 n=10)
MissionControlStoreFlushing/ten_additional_results-4 64.87µ ± 0% 63.99µ ± 0% -1.37% (p=0.000 n=10)
MissionControlStoreFlushing/100_additional_results-4 385.4µ ± 6% 370.6µ ± 0% -3.84% (p=0.000 n=10)
MissionControlStoreFlushing/250_additional_results-4 1016.6µ ± 2% 908.8µ ± 1% -10.60% (p=0.000 n=10)
MissionControlStoreFlushing/500_additional_results-4 1.945m ± 3% 1.780m ± 1% -8.49% (p=0.000 n=10)
geomean 82.42µ 79.10µ -4.03%
I'd hope that this package would make it into the standard library at some point :)
fb43678
to
413b871
Compare
@bitromortac sent commit fb43678 to clean up the mission control tests and address your comments. Also rebased to address the conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements! Thanks for this 🙏
Main suggestion is about how to how to use sync.Cond
instead of a channel for signalling & responding to new items.
Then more generally, some commit structure updates are needed: looks like some review comments were addressed in an additional commit rather than squashing those into the commit in question.
96efd14
to
39f51e1
Compare
@ellemouton Thank you for the thorough review! I believe I addressed all of your points. Also, rebased against current master. Not sure what to do about the release notes, given the v0.18 RCs are already out (move to v0.19 or v0.18.1 or something else). |
thanks for the quick turn around! I'll check it out soon. im also not sure re release notes.... cc @saubyk for that :) |
tagged for 18.1 |
On a second thought, 18.1 is already getting bloated and this doesn't seem urgent so moving to 0.19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
From my tests I also dont see a huge difference between the with-type-assertion vs no-type-assertion diffs so I guess I'd also vote for not pulling in the external dependency for this
39f51e1
to
151522a
Compare
@matheusd - note the failing linter check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 (some nits and the linter fix left)
141ea9a
to
a100051
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there - main thing is to please remove the use of panic
🙏
a100051
to
457fcb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once final nits have been addressed (note: I think you may have missed 2 from the last review)
457fcb0
to
93f6edd
Compare
Sorry for the misses. Updated. |
@matheusd, remember to re-request review from reviewers when ready |
@saubyk - happy for this to be merged? I think it is g2g |
Tagged for 18.1. Release notes need update |
@matheusd - thanks for your patience 🙏 I think we wanna move this to 0.18.1 instead of 0.19 so do you mind doing one last update and moving the release notes to the 0.18.1 doc pls? |
e33d572
to
8f3dbdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
oh no @matheusd - looks like there is a merge conflict 🙈 one more rebase pls 🙏 will then merge this asap so that this can be the last one! |
These will be useful in the next commits.
This modifies the mission control store to avoid doing any work when no new payment result entries are in the queue to be processed. The mission control store maintains keeps the latest N (in production: 1000) entries in its DB, evicting older entries when new ones are added. Currently, its implementation is somewhat less performant than it could be. This commit adds an early return to the storeResults function to avoid doing any DB or memory operations when its outstanding queue is empty, improving the performance during quiescent periods of the LN node's execution.
This modifies the mission control store to avoid running the one second ticker for flushing data when there is no work to be done. This improves performance of a quiscent LN node by avoiding a one second interval busy loop that does nothing when there are no payments flowing through the node.
This removes duplication of in-memory data during the periodic flushing stage of the mission control store. The existing code entirely duplicates the in-memory cache of the store, which is very wasteful when only a few additional results are being rotated into the store. This has a significant performance penalty specially for wallets that remain online for a long time with a low volume of payments. The worst case scenario are wallets that see at most 1 new payment a second, where the entire in-memory cache is recreated every second. This commit improves the situation by determining what will be the actual changes that need to be committed before initiating the db transaction and only keeping track of these to update the in-memory cache if the db tx is successful.
8f3dbdc
to
f39edaa
Compare
The current implementation of the Misson Control Store has a one second ticker that trims the DB, ensuring only the latest 1000 entries are kept.
That ticker was added in #5515 as a way to improve performance by batching and de-duplicating the updates to the db.
However, its current implementation can be significantly improved by doing a few pointed changes. In particular:
This PR improves the situation iteratively on each commit. A set of benchmarks is added to verify the performance improvement.
This is the final result (using a bbolt MC store):
As can be seen, there is a significant improvement in cpu and memory usage when up to 100 additional results are added (i.e. up to 100 txs/second flowing through the node).
The improvement is specially significant when no new results are added to a full MC store, which is relevant for nodes that have a low or only episodic volument of payments (e.g. a mostly end user node or mobile node).