-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
watchtower: replace in-mem task queue with a disk over-flow queue #7380
watchtower: replace in-mem task queue with a disk over-flow queue #7380
Conversation
5075714
to
46bae5d
Compare
1885f79
to
2efadc1
Compare
2efadc1
to
562118e
Compare
562118e
to
6349dbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this PR 💯
The commit structure and commit messages are really beautiful and easy to process!
My main concern is about the complexity of the disk queue. Maybe we can simplify it somewhat, left a few suggestions around that.
6349dbe
to
deb06b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @guggero!
I updated as per your nits & suggestions around the non-overflow queue code. I have mostly just responded to your comments around the overflow queue. Wasn't yet able to figure out how to make it simpler. Let me know if I am missing something with any of the comments :)
0f34b6c
to
82fce00
Compare
@guggero - im gonna re-request just to hear your thoughts on my latest comments (ie, no need for another full review yet) |
fc886e2
to
51379fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really important work 💪 Thank you for the detailed PR description and commit structure! I'm still trying to understand the exact buffering logic for the overflow queue, but left some comments and questions.
51379fd
to
8d55d81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bitromortac ! updated accordingly :)
I also created this test case to make sure the start/stop works correctly under load, feel free to add:
|
49b1875
to
d2de0a8
Compare
Thanks for the reviews @bitromortac & @guggero 🚀 and for the awesome test addition @guggero 🙏 took most of your suggestions, there are two regarding channel closing & buffering that I have not though - will discuss with you offline tomorrow regarding these 👍 |
@ellemouton, remember to re-request review from reviewers when ready |
8f9f8cf
to
52efe08
Compare
Thanks for the very thorough initial review @guggero & @bitromortac 🎉 Also - I really appreciate the extra test added I think this is now ready for the next pass. Here are a few changes to be aware of:
I have performed the following test locally to ensure things work as expected:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, LGTM 🎉
52efe08
to
9b0bce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many nice async patterns in this PR and great comments. Awesome work
@ellemouton and @guggero, was cool to review this! LGTM 🎉
This commit adds a test to the wtclient. The test demonstrates that if a client tries to back up states while it has no active sessions with a server then those updates are accumlated in memory and lost on restart. This will be fixed in upcoming commits.
This commit adds a new generic DiskQueueDB type which is an bbolt implementation of the Queue interface.
This commit adds an in-memory implementation of the Queue interface. This can be used for tests.
In this commit, a new generic DiskOverflowQueue implementation is added. This allows a user to specify a maximum number of items that the queue can hold in-memory. Any new items will then overflow to disk. The producer and consumer of the queue items will interact with the queue just like a normal in-memory queue.
Add a `MaxTasksInMemQueue` field to the `WtClient` config so that users can change the default if they please.
9b0bce4
to
250d764
Compare
@ellemouton I am finalizing #7612 atm. I am wondering if the new option lnd/watchtower/wtclient/client.go Line 57 in cf1f44a
|
good catch @feelancer21! opening a PR soon |
The Problem:
At the moment, updates are passed to the tower client using an unbounded
in-memory list. This is a problem for a few reasons:
indefinitely be queued in memory.
seconds while it waits for the updates to be sent to a tower. This results in
the shutdown of the whole LND binary to be delayed. Not ideal.
those 10 seconds then they are lost forever!
This PR aims to fix all of these issues.
The Solution at a high level:
The main thing that this PR introduces is a disk overflow queue which replaces
the current in-memory queue. This queue ensures that there is a cap on the
number of updates that are held in memory and ensures reliable restarts so
that no updates are lost on restart. With this, the 10 second delay on shutdown
can also be removed.
The flow of the PR:
By the end of the PR, this test will be changed to show that the updates are
no longer lost.
Fixes #5983