multi: check leader status with our health checker to correctly shut down LND if network partitions #8938

bhandras · 2024-07-25T16:25:05Z

Change Description

LND currently holds the leader lease until shutdown, at which point it will resign. In some scenarios, it may be desirable for LND to relinquish leadership and shut down if it becomes partitioned from the etcd cluster. This PR aims to implement this behavior by adding a leader status check to the existing health checks, which will verify the leader status every minute. To prevent hanging due to network issues, we also introduce reasonable timeouts for etcd calls. This allows for a clean shutdown upon a request from the health check module.

Fixes: #8913

Steps to Test

make itest backend=bitcoind dbbackend=etcd icase=leader_health_check

This change is

coderabbitai · 2024-07-25T16:25:11Z

Important

Review skipped

Auto reviews are limited to specific labels.

Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Roasbeef · 2024-07-26T18:24:46Z

go.mod

@@ -207,6 +207,12 @@ replace google.golang.org/protobuf => github.com/lightninglabs/protobuf-go-hex-d
 // Temporary replace until the next version of sqldb is tagged.
 replace github.com/lightningnetwork/lnd/sqldb => ./sqldb

+// Temporary replace until the next version of healthcheck is tagged.
+replace github.com/lightningnetwork/lnd/healthcheck => ./healthcheck


Marker commit to make sure we remove the pins.

lnd.go

server.go

itest/lnd_etcd_failover_test.go

Filiprogrammer · 2024-07-29T16:38:46Z

Tested this on a 3 node regtest cluster:

lndetcd1 (Leader)
lndetcd2 (Waiting)
lndetcd3 (Waiting)

lnd.conf:

[Application Options]
listen=0.0.0.0:9735
alias=lndetcd

[Bitcoin]
bitcoin.regtest=true
bitcoin.node=bitcoind

[Bitcoind]
bitcoind.rpcuser=${BITCOIND_RPCUSER}
bitcoind.rpcpass=${BITCOIND_RPCPASS}
bitcoind.rpchost=${IP_OF_BITCOIND}:8332
bitcoind.zmqpubrawtx=tcp://${IP_OF_BITCOIND}:29001
bitcoind.zmqpubrawblock=tcp://${IP_OF_BITCOIND}:29002
bitcoind.estimatemode=ECONOMICAL

[tor]
tor.active=true
tor.v3=true

[db]
db.backend=etcd

[etcd]
db.etcd.host=127.0.0.1:2379
db.etcd.disabletls=1

[cluster]
cluster.enable-leader-election=1
cluster.leader-elector=etcd
cluster.etcd-election-prefix=cluster-leader
cluster.id=${HOSTNAME}

Logs:

14:43:44: Disconnected lndetcd1 from the network.

14:44:38 lndetcd2: [INF] LTND: Elected as leader (lndetcd2)

14:44:46 lndetcd1: [CRT] SRVR: Health check: leader status failed after 1 calls
14:44:46 lndetcd1: [INF] SRVR: Sending request for shutdown
14:44:46 lndetcd1: [INF] LTND: Received shutdown request.
14:44:46 lndetcd1: [INF] LTND: Shutting down...
14:44:46 lndetcd1: [INF] LTND: Systemd was notified about stopping
14:44:46 lndetcd1: [INF] LTND: Gracefully shutting down.
14:44:46 lndetcd1: [INF] NANN: Channel Status Manager shutting down...
14:44:48 lndetcd1: {"level":"warn","ts":"2024-07-29T14:44:48.658Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:44:55 lndetcd1: {"level":"warn","ts":"2024-07-29T14:44:55.658Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:02 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:02.659Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:09 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:09.660Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:11.655Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:45:11 lndetcd1: [ERR] NANN: Unable to load active channels: no active channels exist
14:45:11 lndetcd1: [INF] HSWC: HTLC Switch shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=7
14:45:11 lndetcd1: [INF] HSWC: Onion processor shutting down...
14:45:11 lndetcd1: [INF] HSWC: Decaying hash log received shutdown request
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=12
14:45:11 lndetcd1: [INF] INVC: InvoiceRegistry shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=11
14:45:11 lndetcd1: [INF] CRTR: Channel Router shutting down...
14:45:11 lndetcd1: [INF] CNCT: ChainArbitrator shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=9
14:45:11 lndetcd1: [INF] FNDG: Funding manager shutting down...
14:45:11 lndetcd1: [INF] BRAR: Breach arbiter shutting down...
14:45:11 lndetcd1: [INF] UTXN: UTXO nursery shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=6
14:45:11 lndetcd1: [INF] DISC: Authenticated gossiper shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=10
14:45:11 lndetcd1: [INF] SWPR: Sweeper shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=5
14:45:11 lndetcd1: [INF] SWPR: TxPublisher stopping...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=4
14:45:11 lndetcd1: [INF] CHNF: ChannelNotifier shutting down...
14:45:11 lndetcd1: [INF] PRNF: PeerNotifier shutting down...
14:45:11 lndetcd1: [INF] HSWC: HtlcNotifier shutting down...
14:45:11 lndetcd1: [INF] CHBU: chanbackup.SubSwapper shutting down...
14:45:11 lndetcd1: [INF] NTFN: bitcoind notifier shutting down...
14:45:11 lndetcd1: [INF] NTFN: Stopping mempool notifier
14:45:11 lndetcd1: [ERR] NTFN: dead epoch stream in BestBlockTracker
14:45:11 lndetcd1: [INF] CHFT: ChannelEventStore shutting down...
14:45:11 lndetcd1: [ERR] HSWC: InterceptableSwitch stopped: block epoch stream stopped
14:45:23 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:23.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:23 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:23.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:30 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:30.686Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":3,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:37 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:37.687Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:41.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:45:41 lndetcd1: [INF] WTCL: (anchor) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:45:41 lndetcd1: [INF] WTCL: (taproot) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:45:44 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:44.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":4,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:51 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:51.691Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:58 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:58.689Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":5,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:05 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:05.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:11.663Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = Unknown desc = context deadline exceeded"}
14:46:12 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:12.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":6,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:19 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:19.692Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:26 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:26.692Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":7,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:33 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:33.693Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:40 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:40.694Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":8,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:41.673Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:46:41 lndetcd1: [INF] WTCL: (taproot) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:46:54 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:54.696Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:54 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:54.696Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":9,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:01 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:01.725Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:08 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:08.726Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":10,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:11.674Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:47:22 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:22.727Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:22 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:22.728Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":11,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:29 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:29.753Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":12,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:36 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:36.754Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:41.679Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:47:43 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:43.754Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":13,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:50 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:50.756Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:57 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:57.757Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":14,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:04 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:04.757Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:11.679Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:48:11 lndetcd1: [INF] HLCK: Health monitor shutting down...
14:48:11 lndetcd1: [INF] RPCS: Stopping RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping VersionRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping RouterRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping WatchtowerClientRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] TORC: Stopping tor controller
14:48:11 lndetcd1: [INF] LTND: Attempting to resign from leader role (lndetcd1)
14:48:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:11.758Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":15,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:14 lndetcd1: [INF] LTND: Shutdown complete

As can be seen from the logs I provided, between 14:44:38 and 14:44:46, both lndetcd1 and lndetcd2 were active at the same time.

Also, lndetcd1 took more than 3 minutes to shut down while trying to interact with etcd. (14:44:46 - 14:48:14)

bhandras · 2024-07-31T14:45:45Z

Tested this on a 3 node regtest cluster:

lndetcd1 (Leader)
lndetcd2 (Waiting)
lndetcd3 (Waiting)

lnd.conf:

[Application Options]
listen=0.0.0.0:9735
alias=lndetcd

[Bitcoin]
bitcoin.regtest=true
bitcoin.node=bitcoind

[Bitcoind]
bitcoind.rpcuser=${BITCOIND_RPCUSER}
bitcoind.rpcpass=${BITCOIND_RPCPASS}
bitcoind.rpchost=${IP_OF_BITCOIND}:8332
bitcoind.zmqpubrawtx=tcp://${IP_OF_BITCOIND}:29001
bitcoind.zmqpubrawblock=tcp://${IP_OF_BITCOIND}:29002
bitcoind.estimatemode=ECONOMICAL

[tor]
tor.active=true
tor.v3=true

[db]
db.backend=etcd

[etcd]
db.etcd.host=127.0.0.1:2379
db.etcd.disabletls=1

[cluster]
cluster.enable-leader-election=1
cluster.leader-elector=etcd
cluster.etcd-election-prefix=cluster-leader
cluster.id=${HOSTNAME}

Logs:

14:43:44: Disconnected lndetcd1 from the network.

14:44:38 lndetcd2: [INF] LTND: Elected as leader (lndetcd2)

14:44:46 lndetcd1: [CRT] SRVR: Health check: leader status failed after 1 calls
14:44:46 lndetcd1: [INF] SRVR: Sending request for shutdown
14:44:46 lndetcd1: [INF] LTND: Received shutdown request.
14:44:46 lndetcd1: [INF] LTND: Shutting down...
14:44:46 lndetcd1: [INF] LTND: Systemd was notified about stopping
14:44:46 lndetcd1: [INF] LTND: Gracefully shutting down.
14:44:46 lndetcd1: [INF] NANN: Channel Status Manager shutting down...
14:44:48 lndetcd1: {"level":"warn","ts":"2024-07-29T14:44:48.658Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:44:55 lndetcd1: {"level":"warn","ts":"2024-07-29T14:44:55.658Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:02 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:02.659Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:09 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:09.660Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:11.655Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008fe540/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:45:11 lndetcd1: [ERR] NANN: Unable to load active channels: no active channels exist
14:45:11 lndetcd1: [INF] HSWC: HTLC Switch shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=7
14:45:11 lndetcd1: [INF] HSWC: Onion processor shutting down...
14:45:11 lndetcd1: [INF] HSWC: Decaying hash log received shutdown request
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=12
14:45:11 lndetcd1: [INF] INVC: InvoiceRegistry shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=11
14:45:11 lndetcd1: [INF] CRTR: Channel Router shutting down...
14:45:11 lndetcd1: [INF] CNCT: ChainArbitrator shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=9
14:45:11 lndetcd1: [INF] FNDG: Funding manager shutting down...
14:45:11 lndetcd1: [INF] BRAR: Breach arbiter shutting down...
14:45:11 lndetcd1: [INF] UTXN: UTXO nursery shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=6
14:45:11 lndetcd1: [INF] DISC: Authenticated gossiper shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=10
14:45:11 lndetcd1: [INF] SWPR: Sweeper shutting down...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=5
14:45:11 lndetcd1: [INF] SWPR: TxPublisher stopping...
14:45:11 lndetcd1: [INF] NTFN: Cancelling epoch notification, epoch_id=4
14:45:11 lndetcd1: [INF] CHNF: ChannelNotifier shutting down...
14:45:11 lndetcd1: [INF] PRNF: PeerNotifier shutting down...
14:45:11 lndetcd1: [INF] HSWC: HtlcNotifier shutting down...
14:45:11 lndetcd1: [INF] CHBU: chanbackup.SubSwapper shutting down...
14:45:11 lndetcd1: [INF] NTFN: bitcoind notifier shutting down...
14:45:11 lndetcd1: [INF] NTFN: Stopping mempool notifier
14:45:11 lndetcd1: [ERR] NTFN: dead epoch stream in BestBlockTracker
14:45:11 lndetcd1: [INF] CHFT: ChannelEventStore shutting down...
14:45:11 lndetcd1: [ERR] HSWC: InterceptableSwitch stopped: block epoch stream stopped
14:45:23 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:23.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:23 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:23.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:30 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:30.686Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":3,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:37 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:37.687Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:41.661Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:45:41 lndetcd1: [INF] WTCL: (anchor) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:45:41 lndetcd1: [INF] WTCL: (taproot) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:45:44 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:44.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":4,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:51 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:51.691Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:45:58 lndetcd1: {"level":"warn","ts":"2024-07-29T14:45:58.689Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":5,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:05 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:05.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:11.663Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = Unknown desc = context deadline exceeded"}
14:46:12 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:12.690Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":6,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:19 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:19.692Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:26 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:26.692Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":7,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:33 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:33.693Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:40 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:40.694Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":8,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:41.673Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:46:41 lndetcd1: [INF] WTCL: (taproot) Client stats: tasks(received=0 accepted=0 ineligible=0) sessions(acquired=0 exhausted=0)
14:46:54 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:54.696Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:46:54 lndetcd1: {"level":"warn","ts":"2024-07-29T14:46:54.696Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":9,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:01 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:01.725Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:08 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:08.726Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":10,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:11.674Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:47:22 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:22.727Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:22 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:22.728Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":11,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:29 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:29.753Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":12,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:36 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:36.754Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:41 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:41.679Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:47:43 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:43.754Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":13,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:50 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:50.756Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:47:57 lndetcd1: {"level":"warn","ts":"2024-07-29T14:47:57.757Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":14,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:04 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:04.757Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:11.679Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008ff6c0/127.0.0.1:2379","attempt":2,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
14:48:11 lndetcd1: [INF] HLCK: Health monitor shutting down...
14:48:11 lndetcd1: [INF] RPCS: Stopping RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping VersionRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping RouterRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] RPCS: Stopping WatchtowerClientRPC Sub-RPC Server
14:48:11 lndetcd1: [INF] TORC: Stopping tor controller
14:48:11 lndetcd1: [INF] LTND: Attempting to resign from leader role (lndetcd1)
14:48:11 lndetcd1: {"level":"warn","ts":"2024-07-29T14:48:11.758Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0006d36c0/127.0.0.1:2379","attempt":15,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
14:48:14 lndetcd1: [INF] LTND: Shutdown complete

As can be seen from the logs I provided, between 14:44:38 and 14:44:46, both lndetcd1 and lndetcd2 were active at the same time.

Also, lndetcd1 took more than 3 minutes to shut down while trying to interact with etcd. (14:44:46 - 14:48:14)

You can try setting --cluster.leader-session-ttl=30s for example to make sure that it times out (and fails over) faster. Note that going super low with the TTL is not recommended.

Filiprogrammer · 2024-07-31T17:30:58Z

You can try setting --cluster.leader-session-ttl=30s for example to make sure that it times out (and fails over) faster. Note that going super low with the TTL is not recommended.

This did not solve the problem of the two nodes occasionally running simultaneously for a few seconds.

But setting healthcheck.leader.interval=30s instead, while leaving the default cluster.leader-session-ttl=60, fixed this.
With these settings, the time for the disconnected leader to initiate a shutdown is between 10-30 seconds, and the time for another node to take over is between 40-60 seconds after the previous leader was disconnected. So healthcheck.leader.interval should be at least 20 seconds lower than cluster.leader-session-ttl to prevent the two from overlapping. 20 seconds seems to be the time interval at which the etcd lease is kept alive.

Therefore, it might be a good idea to reduce the default value of healthcheck.leader.interval to 30 seconds, or instead increase cluster.leader-session-ttl to 90 seconds.

Roasbeef

Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @bhandras)

Crypt-iQ

LGTM besides nits, also release notes

Crypt-iQ · 2024-08-01T12:05:29Z

cluster/etcd_elector.go

+		return false, err
+	}
+
+	return string(resp.Kvs[0].Value) == e.id, nil


should this be length checked?

Good idea, done.

should it also be done for Leader() above?

Added to Leader() as well.

I don't think it is needed for either case as we can assume that there is a leader session so the key exists in the DB, but just in case it's good to have this extra check to avoid crashing in case of some unwanted failure.

Crypt-iQ · 2024-08-01T12:29:32Z

itest/lnd_etcd_failover_test.go

+	go func() {
+		defer p.wg.Done()
+		// Ignore the copy error due to the connection being closed.
+		_, _ = io.Copy(targetConn, conn)


very clever, didn't know you could do this. could be useful for simulating other things like network failure w/o calling the disconnect func

Yeah agreed, it could be useful for other tests too in the future.

Crypt-iQ · 2024-08-01T12:38:12Z

sample-lnd.conf

+; check.
+; healthcheck.leader.attempts=1
+
+; The amount of time we should backoff between failed attempts of leader checks.


description wrong?

bhandras · 2024-08-01T15:16:49Z

You can try setting --cluster.leader-session-ttl=30s for example to make sure that it times out (and fails over) faster. Note that going super low with the TTL is not recommended.

This did not solve the problem of the two nodes occasionally running simultaneously for a few seconds.

But setting healthcheck.leader.interval=30s instead, while leaving the default cluster.leader-session-ttl=60, fixed this. With these settings, the time for the disconnected leader to initiate a shutdown is between 10-30 seconds, and the time for another node to take over is between 40-60 seconds after the previous leader was disconnected. So healthcheck.leader.interval should be at least 20 seconds lower than cluster.leader-session-ttl to prevent the two from overlapping. 20 seconds seems to be the time interval at which the etcd lease is kept alive.

Therefore, it might be a good idea to reduce the default value of healthcheck.leader.interval to 30 seconds, or instead increase cluster.leader-session-ttl to 90 seconds.

Increased the TTL to 90 seconds as our healthchecks are every minute at a minimum currently.

Previously our RPC calls to etcd would hang even in the case of properly set dial timeouts and even if there was a network partition. To ensure liveness we need to make sure that calls fail correctly in case of system failure. To fix this we add a default timeout of 30 seconds to each etcd RPC call.

This is to ensure that the added functionality works correctly and should be removed once these changes are merged and the packages are tagged.

This commit extends our healtcheck with an optional leader check. This is to ensure that given network partition or other cluster wide failure we act as soon as possible to avoid a split-brain situation where a new leader is elected but we still hold onto our etcd client.

bhandras self-assigned this Jul 25, 2024

bhandras added database Related to the database/storage of LND etcd labels Jul 25, 2024

bhandras force-pushed the etcd-leader-election-fixups branch 2 times, most recently from 1e767fd to 5ebb557 Compare July 26, 2024 15:05

bhandras marked this pull request as ready for review July 26, 2024 15:26

Roasbeef reviewed Jul 26, 2024

View reviewed changes

bhandras force-pushed the etcd-leader-election-fixups branch from 5ebb557 to b98958a Compare July 31, 2024 14:41

bhandras requested a review from Roasbeef July 31, 2024 14:45

bhandras mentioned this pull request Jul 31, 2024

mod: bump kvdb to v1.4.10 #8959

Merged

Roasbeef approved these changes Aug 1, 2024

View reviewed changes

Roasbeef added this to the v0.18.3 milestone Aug 1, 2024

Roasbeef requested review from a team, Crypt-iQ and morehouse and removed request for a team and morehouse August 1, 2024 00:38

Crypt-iQ approved these changes Aug 1, 2024

View reviewed changes

bhandras force-pushed the etcd-leader-election-fixups branch from b98958a to 182fad6 Compare August 1, 2024 15:15

bhandras force-pushed the etcd-leader-election-fixups branch 2 times, most recently from 36f9c18 to 4a579b8 Compare August 1, 2024 15:39

bhandras added 4 commits August 1, 2024 19:04

healthcheck: improve logging of observers

0fd4c7d

build: pin healthcheck and kvdb modules temporarily

7784d6a

This is to ensure that the added functionality works correctly and should be removed once these changes are merged and the packages are tagged.

bhandras added 4 commits August 1, 2024 19:04

itest: add itest covering the leader healthcheck

8e49eb6

lncfg: increase default leader session TTL to 90 seconds

91d9fb4

config: update sample-lnd.conf

f63bccb

docs: add release notes for 0.18.3

037161e

bhandras force-pushed the etcd-leader-election-fixups branch from 4a579b8 to 037161e Compare August 1, 2024 17:04

Roasbeef merged commit 6e9eb1d into lightningnetwork:master Aug 1, 2024
28 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi: check leader status with our health checker to correctly shut down LND if network partitions #8938

multi: check leader status with our health checker to correctly shut down LND if network partitions #8938

bhandras commented Jul 25, 2024 •

edited by Roasbeef

Loading

coderabbitai bot commented Jul 25, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Roasbeef Jul 26, 2024

Filiprogrammer commented Jul 29, 2024

bhandras commented Jul 31, 2024

Filiprogrammer commented Jul 31, 2024

Roasbeef left a comment

Crypt-iQ left a comment

Crypt-iQ Aug 1, 2024

bhandras Aug 1, 2024

Crypt-iQ Aug 1, 2024

bhandras Aug 1, 2024

Crypt-iQ Aug 1, 2024

bhandras Aug 1, 2024

Crypt-iQ Aug 1, 2024

bhandras Aug 1, 2024

bhandras commented Aug 1, 2024

multi: check leader status with our health checker to correctly shut down LND if network partitions #8938

multi: check leader status with our health checker to correctly shut down LND if network partitions #8938

Conversation

bhandras commented Jul 25, 2024 • edited by Roasbeef Loading

Change Description

Steps to Test

coderabbitai bot commented Jul 25, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Choose a reason for hiding this comment

Filiprogrammer commented Jul 29, 2024

bhandras commented Jul 31, 2024

Filiprogrammer commented Jul 31, 2024

Roasbeef left a comment

Choose a reason for hiding this comment

Crypt-iQ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhandras commented Aug 1, 2024

bhandras commented Jul 25, 2024 •

edited by Roasbeef

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)