Discussion: Questions on exponentially-increasing IBFT randomTimeout() #245

dankostiuk · 2021-11-25T16:51:28Z

Discussion: Questions on exponentially-increasing IBFT randomTimeout()

Description

While investigating a chain halt that occurred on our testnet earlier this week, restarting all validators fixed our chain as more messages would be sent in a shorter timespan, allowing nodes to transition out of each round faster (since the required messageQueue length thresholds would be met sooner). This then brought up some internal discussion which led to some questions we had about randomTimeout():

Why use exponentially-increasing timeouts in the first place? We realize the original geth fork you've based the SDK on makes use of this calculate as well but we couldn't find out any indication why they didn't use a fixed or linearly-increasing timeout instead (see https://github.com/getamis/go-ethereum/blob/c7547381b2ea8999e423970d619835c662176790/consensus/istanbul/core/core.go#L316-L329). Was this to prevent multiple nodes from changing state at the same time?
Could introducing a new flag to cap the timeout to a specified maximum value alleviate the problem? At least if the maximum randomTimeout() was 10 minutes for example, a chain would be able to recover from a halt in the span of hours instead of what could be days (or more). Shouldn't the goal be to recover the chain as quickly as possible if a chain halt were to occur?

Left this open as a discussion - thanks in advance and as always we appreciate the hard work.

Your environment

OS and version Ubuntu 20
version of the Polygon SDK
branch that causes this issue develop

The text was updated successfully, but these errors were encountered:

mrwillis mentioned this issue Nov 30, 2021

Exponential round timeouts cause very long restart times #261

Closed

This was referenced Dec 1, 2021

Exponential round timeouts cause very long delays in recovering lost consesus #263

Merged

Consensus failure cycling AcceptState -> ValidateState -> RoundChangeState -> AcceptState #248

Closed

zivkovicmilos linked a pull request Dec 1, 2021 that will close this issue

Exponential round timeouts cause very long delays in recovering lost consesus #263

Merged

8 tasks

brkomir closed this as completed in #263 Dec 1, 2021

igorcrevar pushed a commit that referenced this issue Jun 12, 2023

Fix compat problem with optional override arg to eth_call. (#245)

7700cae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Questions on exponentially-increasing IBFT randomTimeout() #245

Discussion: Questions on exponentially-increasing IBFT randomTimeout() #245

dankostiuk commented Nov 25, 2021

Discussion: Questions on exponentially-increasing IBFT randomTimeout() #245

Discussion: Questions on exponentially-increasing IBFT randomTimeout() #245

Comments

dankostiuk commented Nov 25, 2021