-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discovery: implement banning for invalid channel anns #9009
discovery: implement banning for invalid channel anns #9009
Conversation
Important Review skippedAuto reviews are limited to specific labels. Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
29b032d
to
695f00f
Compare
Concept ACK |
It just occurred to me that we can get rid of the new banning code and just use slices of rate limiters in the gossiper instead. Tradeoff would be losing customizable banning code for less code in the discovery package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👌, had some questions but it looks very close.
Missing Release-Notes for 18.3
0af94be
to
0d4e31e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🔥! I think it's important that we keep the flow of channel announcements going if it's our channel, which I think is ensured, but also have a question in a comment. Perhaps we could also make the closed channel index more exhaustive by filling it with detected closes (in the future we may be able to separate zombies from closed channels as well).
900d2c7
to
ee220b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good to go 🙏, only a few nits.
|
||
// Ban a peer by repeatedly incrementing its ban score. | ||
peer1 := [33]byte{0x00} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps test that the peer is not banned beforehand, which also tests the cache.ErrElementNotFound
case
// Assert that purgeBanEntries does nothing. | ||
b.purgeBanEntries() | ||
banInfo, err = b.peerBanIndex.Get(peer1) | ||
require.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semanitcally, a require.NoError
may be better
discovery/gossiper_test.go
Outdated
|
||
select { | ||
case err = <-ctx.gossiper.ProcessRemoteAnnouncement(ca, nodePeer1): | ||
require.NotNil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add require.ErrorContains(t, err, "peer is banned")
, to be a bit more explicit
|
||
select { | ||
case err = <-ctx.gossiper.ProcessRemoteAnnouncement(ca, nodePeer2): | ||
require.NotNil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then here maybe add require.ErrorContains(t, err, "ignoring closed channel")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work 👏
Maybe add in the release notes a statement that this new ban protection excludes Neutrino nodes, tho they are not doing any expensive checks in the first place. Tho the bandwidth requirements would still be high for them.
20f70f3
to
a97bc4f
Compare
discovery/gossiper.go
Outdated
} | ||
|
||
if !chanPeer { | ||
nMsg.peer.Disconnect(ErrPeerBanned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we prevent incoming connections from being fully accepted (either at the brontide connection handshake layer, or in the server before we finalize the peer)? Otherwise, we could have a situation where they: connect, send something bad, and we disconnect (in a loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change in the server, but still need to properly test in a few different scenarios
a97bc4f
to
7e391e2
Compare
@@ -4037,3 +4037,26 @@ func TestGraphLoading(t *testing.T) { | |||
graphReloaded.graphCache.nodeFeatures, | |||
) | |||
} | |||
|
|||
func TestClosedScid(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing doc string and it would be better to use require.NoError
instead of require.Nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoError
checks the same thing: https://github.com/stretchr/testify/blob/master/assert/assertions.go#L1579
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it tests the same. No strong opinion here, but NoError
has an error type as a parameter, will display a better debug message and it may be a bit more readable and nice for code uniformity reasons. This linter for example would complain https://github.com/Antonboom/testifylint?tab=readme-ov-file#error-nil. (non-blocking)
Some lint failures in the latest run:
|
4774dae
to
39d7deb
Compare
test errors look to be unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 19 of 19 files at r2, 4 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 37 unresolved discussions (waiting on @bitromortac, @Crypt-iQ, and @ziggie1984)
Yeah the native sql failures will be fixed with: #9022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cool idea rejecting the peer at the peer connection level ⚡️
|
||
// Reset the AddEdge error and pass the same announcement again. An | ||
// error should be returned even though AddEdge won't fail. | ||
ctx.router.resetAddEdgeErrCode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-Blocking but I think we could improve the graph.Error print method, which your make the debug log easier to read:
// Error satisfies the error interface and prints human-readable errors.
func (e *Error) Error() string {
if e.err != nil {
return fmt.Sprintf("ErrCode: %v, error: %s", e.code, e.err)
}
return fmt.Sprintf("ErrCode: %v", e.code)
}
// String returns the string representation of the error code.
func (e ErrorCode) String() string {
switch e {
case ErrOutdated:
return "ErrOutdated"
case ErrIgnored:
return "ErrIgnored"
case ErrChannelSpent:
return "ErrChannelSpent"
case ErrNoFundingTransaction:
return "ErrNoFundingTransaction"
case ErrInvalidFundingOutput:
return "ErrInvalidFundingOutput"
case ErrVBarrierShuttingDown:
return "ErrInvalidFundingOutput"
case ErrParentValidationFailed:
return "ErrInvalidFundingOutput"
default:
return "<unknown>"
}
}
which makes the output of the test way clearer:
2024-08-27 09:48:05.971 [DBG] DISC: Adding edge for short_chan_id: 111050674405376
2024-08-27 09:48:05.971 [DBG] DISC: Graph rejected edge for short_chan_id(111050674405376): ErrCode: ErrChannelSpent, error: received error
instead of:
2024-08-27 09:11:08.573 [DBG] DISC: Adding edge for short_chan_id: 108851651149824
2024-08-27 09:11:08.573 [DBG] DISC: Graph rejected edge for short_chan_id(108851651149824): received error
@@ -4037,3 +4037,26 @@ func TestGraphLoading(t *testing.T) { | |||
graphReloaded.graphCache.nodeFeatures, | |||
) | |||
} | |||
|
|||
func TestClosedScid(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it tests the same. No strong opinion here, but NoError
has an error type as a parameter, will display a better debug message and it may be a bit more readable and nice for code uniformity reasons. This linter for example would complain https://github.com/Antonboom/testifylint?tab=readme-ov-file#error-nil. (non-blocking)
39d7deb
to
a0e6598
Compare
This commit adds the ability to store closed channels by scid in the database. This will allow the gossiper to ignore channel announcements for closed channels without having to do any expensive validation.
This commit introduces a ban manager that marks peers as banned if they send too many invalid channel announcements to us. Expired entries are purged after a certain period of time (currently 48 hours).
This will be used in the gossiper to disconnect from peers if their ban score passes the ban threshold.
This commit hooks up the banman to the gossiper: - peers that are banned and don't have a channel with us will get disconnected until they are unbanned. - peers that are banned and have a channel with us won't get disconnected, but we will ignore their channel announcements until they are no longer banned. Note that this only disables gossip of announcements to us and still allows us to open channels to them.
a0e6598
to
95acc78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing we probably need to add in the follow-up PR is a way for nodes being infected with the old channel data set, to get cured. Is there currently a way to wipe the whole graph history other than chantools? Because for nodes infected, they might after this PR get banned by a lot of peers. Pretending they are not doing it deliberately we should probably also fix the problem for them (in case they are running LND) ?
Reviewable status: 20 of 23 files reviewed, 43 unresolved discussions (waiting on @bitromortac, @Crypt-iQ, and @Roasbeef)
So we know that one vector of these old channels was actually a version of CLN that had a bug causing it not to detect channels as actually being closed. We also know that some lnd nodes that are running neutrino nodes (assumechanvalid) may have stored those channels on disk momentarily. Once the zombie tick interval passes, neutrino nodes will prune these from disk, and now be able to use the spend index to avoid re-downloading all the channels. |
So I analysed it for neutrino nodes, and the problem is that we are not deleting those channels with only a announcement: |
Partially addresses #8889, specifically parts of #4 & #5 here: #8889 (comment)
This PR implements banning for invalid channel announcements:
This PR also keeps track of closed channels such that we won't attempt to validate channel announcements for closed channels.
Future improvements:
This change is