-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: postgres error: could not serialize access due to read/write dependencies among transactions (SQLSTATE 40001) #8049
Comments
Since #7927 we do re-try on certain SQL failures. Perhaps we just need to catch |
Reading https://github.com/lightningnetwork/lnd/blob/master/sqldb/sqlerrors.go#L70-L74 with
Okay, I'll try 5 minutes and see how it goes. Usually compaction takes less than that.
Yeah, can't wait to try out #7992! Although a huge key-value table still is a restriction regarding (b)locking techniques, so also looking forward to 0.18 and first SQL schemes. |
So #7992 also fixes issues with the retry logic: before certain errors weren't properly wrapped, so they weren't detected as serialization errors. Interesting that you're running into it as is though, since we have an in process mutex that should limit to just a single writer. EDIT: ah reading it again, I see you're running a background vacuum, that could trigger the retry logic there. |
Okay, this explains why vacuuming the database with the same database user as lnd is set up with throws the error. I can reproduce the error by background vaccuuming regardless of timeout setting. Then I'm not sure what caused the serialization error that led to the force close. Could it be heavy usage due to parallelization of rebalancing and forwarding? I'll keep an eye on it. |
Today I compacted the database manually with user "postgres" instead. This correctly pauses all db actions for lnd and resumes them after compaction has finished. No errors in lnd's log. Still not sure what caused the force close now but I guess a concurrent database access with the same lnd user is the closest I could think of. |
Tried another compaction run which caused LND to shutdown again.
Compacting duration:
Compaction Starting Time: 2023-10-08 16:41:50
|
This issue will be addressed with #7992 |
Background
Been observing an issue with PostgreSQL DB since version 0.17 regarding serialization of reads/writes (probably because LND now exposes SQL errors to logs?).
Before 0.17 I used to compact on-the-fly while running lnd in background. With 0.17 I ran across this serialization error when compacting (vaccuum full). Today this error also led to a force close of a channel (no compaction happened at this time):
followed by multiple
resulting in
Your environment
lnd
: 0.17-rc6uname -a
on *Nix): Linux 5.15.0-84-generic 93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64btcd
,bitcoind
, or other backend: bitcoind v25lnd.conf:
Steps to reproduce
Reproduction method unknown.
Expected behaviour
No SQL error.
Actual behaviour
Serialization error in database.
Logs:
FC Log: fc.log
SQLSTATE 40001 while compacting: compacting.log
tl;dr:
I think this is the equivalent of #7869 for postgres. I'll try to set
db.postgres.timeout
to10m
. Although not sure why lnd is not retrying transactions as specified in #7960 (running latest 0.17-rc6).The text was updated successfully, but these errors were encountered: