Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localchans: recreate missing edge if not found #8805

Merged

Conversation

JssDWt
Copy link
Contributor

@JssDWt JssDWt commented Jun 3, 2024

Change Description

Description of change / link to associated issue.If a node contains a channel, but doesn't have a corresponding edge in the graph database, updating the channel policy would fail. In this commit the edge is recreated if the channel exists. This ensures a node can recover from a missing edge in the graph database by calling updatechanpolicy.

Alternative for #8768, namely option 2 in #8768 (comment)
Partially fixes #7261 by allowing to recreate the edge by calling updatechanpolicy.

Steps to Test

  • Create a node that has a channel with a missing edge
  • Calling getchaninfo on this channel will fail
  • Call updatechanpolicy on this channel
  • Calling getchaninfo on this channel should succeed

I'm not sure how to create an integration test where I can modify the graph database to delete an edge in order to test this. Please advise.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

Copy link
Contributor

coderabbitai bot commented Jun 3, 2024

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from 6a4c3af to 923c139 Compare June 6, 2024 09:55
@yyforyongyu yyforyongyu self-requested a review June 11, 2024 16:39
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept Ack - if we decide to do this, we should at least add a new flag in updatechanpolicy cli and RPC to let the user choose to re-create the missing edge policies or not.

@guggero
Copy link
Collaborator

guggero commented Jun 20, 2024

Release notes need to be moved to the v0.18.1 file.

@saubyk saubyk added this to the v0.18.1 milestone Jun 20, 2024
@saubyk saubyk added graph P2 should be fixed if one has time labels Jun 25, 2024
@yyforyongyu
Copy link
Member

Maybe it's related to this?

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Jul 1, 2024

Maybe it's related to #8870 (comment)?

Hmm I don't think so, not having the edge available means somehow that our own ChanAnnouncment didn't got through? Not receiving the node announcment would just leave us with a NodeAnnouncement Shell, we add during adding the edge to the db.

@ziggie1984
Copy link
Collaborator

Concept Ack from my side as well, I think it's the best way to fix broken channels suffering from the Edge not Found problem. I think we still have not traced down the origin of the problem, this gives us at least the possibility to mitigate the problem.

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from 5f0672d to 82374e5 Compare July 2, 2024 10:00
@JssDWt
Copy link
Contributor Author

JssDWt commented Jul 2, 2024

Sorry for letting this sit for so long. Updated with your feedback now @yyforyongyu

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from 82374e5 to ef7a4e0 Compare July 2, 2024 10:30
@@ -180,6 +216,87 @@ func (r *Manager) UpdatePolicy(newSchema routing.ChannelPolicy,
return failedUpdates, nil
}

// createEdge recreates an edge and policy from an open channel in-memory.
func (r *Manager) createEdge(channel *channeldb.OpenChannel) (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to have a unit test for this - meanwhile have you tested this cli out locally? Think it's not suitable for itest as it's hard to create this edge case.

Copy link
Contributor Author

@JssDWt JssDWt Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll cook a test.

I haven't tested this out yet, I was really looking for your review here before we deploy this in production (which is where the issue occurs), but we'll do that once we reach a consensus about the solution here.

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Jul 2, 2024

what about passing the graph db as a config to the localchan_manager, so that we can when recreating the edge just call AddChannelEdge and short-circuit all the other stuff which is done in the gossiper ? Later during the ChanUpdate (PropagateChanPolicyUpdate) all the necessary stuff is done (adding the update to the Topology etc.).

It's a mitigation to a unknown bug anyways which we will find as soon as we improved logging in the gossiper.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah PropagateChanPolicyUpdate will give ErrEdgeNotFound since the policy is not there yet. I think it's easier to fix it this way,

diff --git a/routing/localchans/manager.go b/routing/localchans/manager.go
index 930b4c96b..ed5de6764 100644
--- a/routing/localchans/manager.go
+++ b/routing/localchans/manager.go
@@ -40,6 +40,8 @@ type Manager struct {
 	FetchChannel func(tx kvdb.RTx, chanPoint wire.OutPoint) (
 		*channeldb.OpenChannel, error)
 
+	AddEdge func(edge *models.ChannelEdgeInfo) error
+
 	// policyUpdateLock ensures that the database and the link do not fall
 	// out of sync if there are concurrent fee update calls. Without it,
 	// there is a chance that policy A updates the database, then policy B
diff --git a/server.go b/server.go
index 84a0ceaa7..215e424c3 100644
--- a/server.go
+++ b/server.go
@@ -1056,6 +1056,9 @@ func newServer(cfg *Config, listenAddrs []net.Addr,
 		PropagateChanPolicyUpdate: s.authGossiper.PropagateChanPolicyUpdate,
 		UpdateForwardingPolicies:  s.htlcSwitch.UpdateForwardingPolicies,
 		FetchChannel:              s.chanStateDB.FetchChannel,
+		AddEdge: func(edge *models.ChannelEdgeInfo) error {
+			return s.chanRouter.AddEdge(edge, nil)
+		},
 	}
 
 	utxnStore, err := contractcourt.NewNurseryStore(

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from ef7a4e0 to 6c5e7e9 Compare July 2, 2024 18:56
@JssDWt
Copy link
Contributor Author

JssDWt commented Jul 2, 2024

Handled most comments. Still have to create a test.

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch 2 times, most recently from 443e287 to dd99a51 Compare July 4, 2024 12:06
@JssDWt
Copy link
Contributor Author

JssDWt commented Jul 4, 2024

Added a test for createEdge

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from dd99a51 to bc883b4 Compare July 4, 2024 12:50
@JssDWt
Copy link
Contributor Author

JssDWt commented Jul 4, 2024

Now also added a test to the UpdatePolicy function

@JssDWt JssDWt requested review from yyforyongyu and ziggie1984 July 4, 2024 12:51
If a node contains a channel, but doesn't have a corresponding edge in
the graph database, updating the channel policy would fail. In this
commit the edge is recreated if the channel exists. This ensures a node
can recover from a missing edge in the graph database by calling
updatechanpolicy.
@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from a3e6c86 to 4e6554b Compare November 13, 2024 11:13
@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 13, 2024

Added comments that the ForAllOutgoingChannels function may invoke the callback with a nil channel policy.
Also rebased.

This is a robustness option to ensure LND doesn't crash when this
function is accidentally called with `AddChannelEdge(edge, nil)`.
@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from 4e6554b to 20f7c73 Compare November 14, 2024 10:42
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment, then think it's good to go!

@ziggie1984
Copy link
Collaborator

Ok let's not change too much right now let's do the following:

  1. We cannot just log a warning in the edge=nil case, with the combination of createMissingEdge we would attempt in the worst case scenario to readd an edge => so let's return an error here, because that should never be the case.

  2. In updateEdge => use a NON-Pointer for the *models.ChannelEdgePolicy

  3. remove this form updateEdge: if edge == nil { _, edge, err = r.createEdge(channel, time.Now()) if err != nil { return nil, err } }

We now check the policy so I don't think there is a need for the above.

Then we should be good to go

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 18, 2024

I undid the tryout commit.

  1. In updateEdge => use a NON-Pointer for the *models.ChannelEdgePolicy

This we can't do, because that wouldn't update the edge outside the updateEdge function.

  1. We cannot just log a warning in the edge=nil case, with the combination of createMissingEdge we would attempt in the worst case scenario to readd an edge => so let's return an error here, because that should never be the case.
  2. remove this form updateEdge: if edge == nil { _, edge, err = r.createEdge(channel, time.Now()) if err != nil { return nil, err } }

So this basically means revert commit 7d9d100 correct?

That would also address this comment:
#8805 (comment)

@ziggie1984
Copy link
Collaborator

This we can't do, because that wouldn't update the edge outside the updateEdge function.

I am not sure I understand, I am talking about the func (r *Manager) updateEdge(tx kvdb.RTx, chanPoint wire.OutPoint, function, which is only used 2 times and also returns the specified edge so we can use the return value here, as we currently do anyways ?

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 19, 2024

I am not sure I understand, I am talking about the func (r *Manager) updateEdge(tx kvdb.RTx, chanPoint wire.OutPoint, function, which is only used 2 times and also returns the specified edge so we can use the return value here, as we currently do anyways ?

Ahh I see, I didn't get that.

It's no longer needed to return the edge at all if processEdge returns an error on a nil edge. Because the edge will never be nil, there is never a reason to create a new one. So even the ForAllOutgoingChannels can go back to it's original form. I suggest to do that as it will be the least amount of change.

So revert 7d9d100
And add a line to call updateEdge before inserting the edge in the database.

If you agree, I'll do that.

@ziggie1984
Copy link
Collaborator

ahh ok yeah let's do that then and we should be good to go.

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Nov 20, 2024

So revert 7d9d100
And add a line to call updateEdge before inserting the edge in the database.

yes but I think we don't want this in the reverted commit:

	// If due to some unforeseen circumstances the policy doesn't exist,
	// recreate it here.
	if edge == nil {
		_, edge, err = r.createEdge(channel, time.Now())
		if err != nil {
			return nil, err
		}
	}

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 21, 2024

I think I made a commit that puts everything together now.

@JssDWt JssDWt force-pushed the jssdwt-insert-edge-when-not-found branch from 6f2d8ca to 2665312 Compare November 21, 2024 09:57
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡️, thank you for your patience while working on this PR. Tested locally via a customized DB, works as expected.

@guggero guggero requested a review from yyforyongyu November 21, 2024 11:12
@ziggie1984
Copy link
Collaborator

ziggie1984 commented Nov 21, 2024

In case you wanna verify (that the whole regtest setup with the missing edge):
missing_edge_data.zip

how it looks like from the cli level:

rlncli updatechanpolicy --base_fee_msat 1000 --time_lock_delta 100  --fee_rate_ppm 100 --chan_point 86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0
{
    "failed_updates": [
        {
            "outpoint": "86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0",
            "reason": "UPDATE_FAILURE_UNKNOWN",
            "update_error": "could not update policies"
        }
    ]
}
❯ rlncli updatechanpolicy --base_fee_msat 1000 --time_lock_delta 100  --fee_rate_ppm 100 --chan_point 86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0 --create_missing_edge
{
    "failed_updates": [
        {
            "outpoint": "86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0",
            "reason": "UPDATE_FAILURE_INVALID_PARAMETER",
            "update_error": "min_htlc 1000 mSAT greater than max_htlc 0 mSAT"
        }
    ]
}
❯ rlncli updatechanpolicy --base_fee_msat 1000 --time_lock_delta 100  --fee_rate_ppm 100 --chan_point 86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0
{
    "failed_updates": [
        {
            "outpoint": "86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0",
            "reason": "UPDATE_FAILURE_UNKNOWN",
            "update_error": "could not update policies"
        }
    ]
}
❯ rlncli updatechanpolicy  --min_htlc_msat 1000  --base_fee_msat 1000 --time_lock_delta 100  --fee_rate_ppm 100 --chan_point 86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0 --create_missing_edge
{
    "failed_updates": [
        {
            "outpoint": "86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0",
            "reason": "UPDATE_FAILURE_INVALID_PARAMETER",
            "update_error": "min_htlc 1000 mSAT greater than max_htlc 0 mSAT"
        }
    ]
}
❯ rlncli updatechanpolicy  --min_htlc_msat 1000 --max_htlc_msat  20000    --base_fee_msat 1000 --time_lock_delta 100  --fee_rate_ppm 100 --chan_point 86919ba50fb4c756a3374c35eb3d0e4b0759110a5ac4bd750b121a61ab72c1b4:0 --create_missing_edge
{
    "failed_updates": []
}

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🚢

@guggero guggero merged commit 38ec1a1 into lightningnetwork:master Nov 22, 2024
26 of 33 checks passed
@saubyk saubyk modified the milestones: v0.19.0, v0.18.5 Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graph P2 should be fixed if one has time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: "edge not found" for local channel
6 participants