-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
itest: make sure to not hit the natural ChannelUpdate rate limit #9567
itest: make sure to not hit the natural ChannelUpdate rate limit #9567
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Channel Updates have a natural rate limit of 1 update per second due to the fact that the timestamp carried in the update is only accurate to the second. So we need to ensure that the next update we send in the burst is at least 1 second after the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! ⚔️
// per second due to the fact that the timestamp carried in the update | ||
// is only accurate to the second. So we need to ensure that the next | ||
// update we send in the burst is at least 1 second after the last one. | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might make sense to rename assertAliceAndBob to updateAndAssertChanPolicy for clarity too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea - will add a commit 🫡
b7c5895
into
lightningnetwork:elle-graphCacheBase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a weird behavior, analyzed the logs from this build,
Channel open, not counted as rate limit.
2025-02-28 08:10:16.250 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:16.257 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Carol], for node=[Alice], has edge policy=false
Ignored channel updates, not counted as rate limit.
2025-02-28 08:10:21.426 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000000, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:21.427 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:21.433 [DBG] DISC gossiper.go:3004: Ignored stale edge policy for short_chan_id([SID: Alice=>Carol]): peer=[Alice]@127.0.0.1:39298, msg=ChannelUpdate, is_remote=true
2025-02-28 08:10:21.437 [DBG] DISC gossiper.go:3004: Ignored stale edge policy for short_chan_id([SID: Alice=>Carol]): peer=[Alice]@127.0.0.1:39298, msg=ChannelUpdate, is_remote=true
Rate limit once for each channel.
2025-02-28 08:10:22.580 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Bob], mflags=00000001, cflags=00000000, update_time=2025-02-28 08:10:22 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:22.580 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:22 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:22.586 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Bob], for node=[Alice], has edge policy=true
2025-02-28 08:10:22.589 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Carol], for node=[Alice], has edge policy=true
Rate limit twice for each channel, should pass, but the channel Alice=>Carol
is limited. It's weird as ValidateChannelUpdateAnn
has been called twice for the ChannelUpdate
from the channel Alice=>Carol
, which I think is the root cause.
2025-02-28 08:10:23.938 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Bob], mflags=00000001, cflags=00000000, update_time=2025-02-28 08:10:23 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:23.938 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:23 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:23.950 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Bob], for node=[Alice], has edge policy=true
2025-02-28 08:10:23.952 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Carol], for node=[Alice], has edge policy=true
2025-02-28 08:10:23.952 [DBG] DISC gossiper.go:3206: Rate limiting update for channel [SID: Alice=>Carol] from direction [Alice]
2025-02-28 08:10:23.976 [DBG] DISC gossiper.go:3334: Processed ChannelUpdate: peer=[Alice]@127.0.0.1:39298, short_chan_id=[SID: Alice=>Bob], timestamp=2025-02-28 08:10:23 +0000 UTC
# Why is the validation called again for Alice=>Carol?
2025-02-28 08:10:24.022 [DBG] DISC gossiper.go:3145: Validating ChannelUpdate: channel=[SID: Alice=>Carol], for node=[Alice], has edge policy=true
2025-02-28 08:10:24.022 [DBG] DISC gossiper.go:3206: Rate limiting update for channel [SID: Alice=>Carol] from direction [Alice]
// per second due to the fact that the timestamp carried in the update | ||
// is only accurate to the second. So we need to ensure that the next | ||
// update we send in the burst is at least 1 second after the last one. | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the evidence showing this is the cause? Could you provide the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see PR description of the log line
builder.go:1331: process network updates got: Ignoring outdated update (flags=00000001|00000000) for known chan_id=8015439768453121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gimme a sec to write up the exact order of ops. But pretty sure this reasoning was correct. basically, the global updates are applied very quickly after eachother and so the timestamp is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the validation called again for Alice=>Carol?
because 2 updates are getting sent very close together.
note that the issue is happening before the actual rate limit logic kicks in
We are sending a channel update on channel open, we then make a global update to base fee 800, then we make another one to 1600. All of these should be received just fine by Carol since the channel open one doesnt count towards rate limit and so the next 2 should be received just fine. But the 1600 is sent too quickly after the 800 one and so it has the same timestamp and so gets ignored as "outdated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we definitely do expect the Validation line twice in the normal case. I dont think anything else weird is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, it should be caught here?
Line 3000 in f744a54
if d.cfg.Graph.IsStaleEdgePolicy( |
Also from the logs, the last two ChannelUpdate
have different timestamps.
Channel open, not counted as rate limit.
2025-02-28 08:10:16.250 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
Ignored channel updates, not counted as rate limit.
2025-02-28 08:10:21.426 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000000, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
2025-02-28 08:10:21.427 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:16 +0000 UTC) from [Alice]@127.0.0.1:39298
Global update to base fee 800, rate limit once for each channel.
2025-02-28 08:10:22.580 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:22 +0000 UTC) from [Alice]@127.0.0.1:39298
Global update to base fee 1600, rate limit once for each channel.
2025-02-28 08:10:23.938 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:23 +0000 UTC) from [Alice]@127.0.0.1:39298
Looking closely, I think the Ignoring outdated...
actually comes from processing the ChannelUpdate
sent from Bob,
2025-02-28 08:10:22.580 [DBG] PEER brontide.go:2412: Peer([Alice]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:22 +0000 UTC) from [Alice]@127.0.0.1:39298
...
It says the `ChannelUpdate` from Alice has been processed without error.
2025-02-28 08:10:22.605 [DBG] DISC gossiper.go:3334: Processed ChannelUpdate: peer=[Alice]@127.0.0.1:39298, short_chan_id=[SID: Alice=>Bob], timestamp=2025-02-28 08:10:22 +0000 UTC
...
Then Carol received it from Bob.
2025-02-28 08:10:22.612 [DBG] PEER brontide.go:2412: Peer([Bob]): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=[SID: Alice=>Carol], mflags=00000001, cflags=00000001, update_time=2025-02-28 08:10:22 +0000 UTC) from [Bob]@127.0.0.1:11599
2025-02-28 08:10:22.612 [DBG] DISC gossiper.go:2942: Processing ChannelUpdate: peer=[Bob]@127.0.0.1:11599, short_chan_id=[SID: Alice=>Carol],
This is now ignored because we already received it from Alice.
2025-02-28 08:10:22.627 [DBG] GRPH builder.go:1331: process network updates got: Ignoring outdated update (flags=00000001|00000001) for known chan_id=[SID: Alice=>Carol]
2025-02-28 08:10:22.627 [DBG] DISC gossiper.go:3242: Update edge for short_chan_id([SID: Alice=>Carol]) got: Ignoring outdated update (flags=00000001|00000001) for known chan_id=[SID: Alice=>Carol]
In addition if the error is from Alice's update I think the UpdateChannelPolicy
would fail? And why would this affect the rate limiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yeah you're right. Looking
Channel Updates have a natural rate limit of 1 update per second due to the fact that the timestamp carried in the update is only accurate to the second. So we need to ensure that the next update we send in the burst is at least 1 second after the last one.
This flake was appearing in the
update channel policy
itest in the part of the tests where we are testingthe burst rate limit logic of Carol. It's flaky at the moment though due to Carol sometimes rejecting a channel update
due to the natural 1-update-per-second rate limit. So we need to ensure that Alice waits at least a second between
sending updates for this logic to be tested properly. The log line that appears after the channel updates are sent too quickly after each there ie:
builder.go:1331: process network updates got: Ignoring outdated update (flags=00000001|00000000) for known chan_id=8015439768453121
As to why this is flake is appearing more often now, I think it may be due to this PR which removes the logic that would pass channel updates through the same channel before handling them in a goroutine - All that did was make it less likely for the flake to occur though cause it added a small artificial delay between the updates.