-
Notifications
You must be signed in to change notification settings - Fork 66
Problem (Fix #1835): non-determinism in reward distribution #1859
Conversation
What was the cause of non-determinism? |
begin block n event might be processed more than once before commit. |
Solution: - Add block height assertion in begin block to prevent consensus connection re-connect
Codecov Report
@@ Coverage Diff @@
## master #1859 +/- ##
==========================================
+ Coverage 66.15% 66.16% +0.01%
==========================================
Files 205 205
Lines 26088 26090 +2
==========================================
+ Hits 17258 17263 +5
+ Misses 8830 8827 -3
|
Why tendermint’s consensus connection got closed? Was there any error on tendermint side that caused tendermint to terminate connection? |
There are logs in slack discussions |
Can we figure out the reason for tendermint’s consensus connection disconnection from logs? |
The abci logged an empty begin block response for some reason:
|
"funky" deployment where processes are restarted individually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sort of assertion should be in rust-abci instead of in each application.
upgrading to abci = "0.7.1" doesn't work?
I think current abci don't assert the internals of messages, add assertion on block height could solve the issue more thoroughly. maybe abci in the future can add more assertions on state machine transition or internals of messages. |
there are also a few ignored tests: https://github.com/crypto-com/chain/blob/master/chain-abci/tests/abci_app.rs#L633 current abci doesn't assert them, but a new version doesn't throw away the errors like before, so it'd be good to check whether this fixes the connection drop / it'll fail with the connection drop (so one restarts from the committed state) |
@tomtau What should ideally be the behaviour in such scenarios? Should ABCI app panic or should it discard current uncommitted state and start processing new |
@devashishdxt it's not specified anywhere at the moment. |
Solution:
connection re-connect