-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joint Consensus #101
Joint Consensus #101
Conversation
7b72667
to
6cc0019
Compare
8f4e0aa
to
f40d8fa
Compare
e16a630
to
d6986d9
Compare
b98e5a5
to
51c83fc
Compare
I don't think disable removing leader can solve the problem at all. Because the leader can be down at any time, in which case, all new peers may not be able to elect a leader successfully if they are all not part of the old configuration. |
@BusyJay I thought about this more last night too, and I agree it doesn't solve the problem. |
Signed-off-by: qupeng <[email protected]>
PTAL again pre-merge. Please pay particular attention to documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR enables Joint Consensus as described by the Raft paper (https://raft.github.io/raft.pdf).
As of this commit it is possible to undergo arbitrary peer membership changes in a safe way.
Notably, this gives us a resilient "Replace Node" functionality, which is able to progress in the situation of a loss of both the old (removed) and new (added) peers go down. This is not possible with our previous one-at-a-time strategy.
Unfortunately, this feature is fairly large in scope. Thankfully it's mostly testing code!
There is some new API surface, notably the
Raft::begin_membership_change
function.There is also some moved API surface. The
RaftLog::applied_to
function has been moved toRaft::commit_apply
. The old function, while it still works, carries a deprection warning. In the future we should make it a function available to onlyRaft::applied_to
.Several APIs have changed to return errors instead of panic or otherwise ignore non-critical errors. In most cases you can just do
old_call().ok()
to have the same behavior.Then, in order to facilitate the needs in testing, some additional API was added to the
Network
harness, which now can dispatch messages and not automatically send responses, allowing time to inspect the state of peers.Finally, many tests were added to ensure Joint Consensus can work.