-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only update max size of undo_db and fork_db after successfully pushed a block #1333
Comments
Related: steemit/steem#2911 (different scenario). |
Steem PR: steemit/steem#2914 (also contains other changes, may or may not apply to BitShares). |
Moving the checkpointing code to push_block means checkpoints are ignored during most of replay. I don't think we should do this. |
Just FYI my patch for Steem didn't move the checkpointing code, instead, I added one in |
Please take a look over these ramblings and see if I am on the right track:
So, a test of this scenario would require Am I close? |
The critical part is that
I think this is close to impossible in practice. Fork B can only become valid if you replace 1/3 of the witnesses on that fork only, or when 1/3 of the witnesses are double-signing. |
Thank you for clearing that up. I was thinking about subsequent blocks coming in, but was getting lost because how can 1 fork have a LIB and another no? That would mean that the LIB is reversible, which should be impossible. Now I am straight. A block comes in that gets LIB to be on 1 fork, we somewhat liked it, we adjusted the size of undo_db, and then had a problem with that same block. Now we don't have what we need to switch forks. I will attempt to make a test for that scenario. I'll put a lower priority on this, due to the low chance of this occurring. |
The calculation for LIB comes after the calculation for the _undo_db and _fork_db size. I believe that means that if the current block moves the LIB forward, it will not be until the next block that the blocks are prematurely removed from _undo_db. So my latest hypothesis is that this problem can not be caused by a block moving the LIB forward. If that hypothesis is correct, I ask the following 2 questions:
I believe the answer to question 2 is no, as it has been working that way. |
The actual shrinking (i. e. deletion of entries) in undo_db happens when a new undo_session is created. After applying a block, this will typically happen when pending transactions are replayed. Suppose we're at block 10 and dgp.lib is 3. We receive block 11. This happens:
This looks safe, unless I've missed something. When blocks are popped, they stay in fork_db, while their corresponding undo_sessions are rolled back and removed from the undo_stack. Note that popping blocks will also undo updates to We only pop blocks when switching to a longer fork, so after popping some and then applying some more I can see one edge case though:
E. g. if in the above example we receive block 11' instead of 11, then roll back 10, 9, 8, and 7, then apply 7', we might have Thoughts? |
A worse case is that LIB advances E.G. to 6, then 3,4,5 will be dropped too early.
This is the scenario described in steemit/steem#2911 |
If LIB advances to 6 then 3,4,5 can be dropped. The scenario when switching forks can be fixed easily as in steemit/steem@653f0b5 . |
... only when "the block is applied successfully".
This is needed as well: steemit/steem@c54fd43#diff-778ae4a84a14457ed12b22337048d2bfR3259 |
Oh, right.
Something like this, yes. Their codebase looks quite different at that point. |
@jmjatlanta is investigating a test case to identify this bug. |
Branch jmj_1333 presents a test case to attempt to cause the issue, but with no success yet. There are currently gaps in my knowledge in (at least) 3 areas. The notes below probably will only make sense to me, but I do not want to forget them:
|
Rogue witnesses are a separate problem, you can ignore them here. LIB states a block number is fine, because by definition there cannot be competing LIBs. (With rogue witnesses, two competing forks each with their own LIB can emerge. This is a different problem, see BFT.) See my comment #1333 (comment) above, and @abitmore 's remark that "Worst case is LIB advances to 6, then the block fails and is rolled back". Then the next block arrives and both fork_db and undo_db are pruned too far. |
Hey all, trying to wrap my head around this issue too... Setting aside for a moment discussion of the max undo size, a scenario which jumps the LIB forward the maximum number should trigger the bug, right? So does this toy scenario trigger it?: 7 witnesses: ABCDEFGH
... except block 12 has a bad transaction, so it fails midway through and gets rewound, and then B shows up with a different block 5 and production continues from that chain... should that trigger the bug? |
I suppose in my scenario above, H could sign 12, which would leave LIB at 4, and then A signs 13, which would cause LIB to jump to 9:
So that would be the maximum jump size of 5 blocks. Then 13 fails and rewinds, B continues from 5... |
Yes, I think that's accurate. Except that A-H are 8 witnesses not 7 :-). |
This issues is about defensive approaches when something wrong HAS happened. To be clear, anything which can cause an uncaught exception after
@jmjatlanta to write a test case for this issue, we need to do something abnormal in the test case, specifically, need to trigger an exception after |
As @jmjatlanta noticed, actually we're resizing undo_db and fork_db based on last block's LIB. That said, resize is deferred to the next block. Thus there would be no issue if the new size is big enough, as long as the new size is based on state of last block. bitshares-core/libraries/chain/db_update.cpp Lines 80 to 81 in 5bad558
+2 , or |
Not a bug. We're safe. |
We still have issues here similar to steemit/steem#2911. This would be easier to reproduce. Perhaps better create a new issue for it? |
Wow I'm an idot 🤦♂️ haha |
Closing this since @jmjatlanta has created #1679 for following Steem fork_db issue (steemit/steem#2911). |
Bug Description
There is a corner case that would cause chain reorganization to fail.
update_global_dynamic_data()
is called in the middle of_apply_block()
:bitshares-core/libraries/chain/db_block.cpp
Line 575 in 0508438
Max size of both fork db and undo db is updated inside
update_global_dynamic_data()
:bitshares-core/libraries/chain/db_update.cpp
Lines 80 to 81 in 0508438
If the max size is changed to smaller or kept unchanged, the original smallest reversible block will be removed from undo db and fork db. If an exception is thrown after
update_global_dynamic_data()
in_apply_block()
, changes done in the object database will be reverted, but the removed block won't be put into fork db nor undo db again. In this case, the chain would be unable to reorganize to another fork if the removed block is not in the fork.By the way, there is another scenario described in steemit/steem#2911.
Expected Behavior
Always able to reorganize the chain from LIB.
Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.
CORE TEAM TASK LIST
The text was updated successfully, but these errors were encountered: