-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster rejoin behavior #918
Comments
Existing member [UniqueAddress: (akka.tcp://[email protected]:8100, 1641121334)] is trying to join, ignoring When i shut the node down and try again. Looks like the id changed but the node is still being ignored |
Possibly related to #774 |
// check by address without uid to make sure that node with same host:port is not allowed
// to join until previous node with that host:port has been removed from the cluster
// ^ WAT? checking w/o UID
var alreadyMember = localMembers.Any(m => m.Address == node.Address);
var isUnreachable = !_latestGossip.Overview.Reachability.IsReachable(node);
if (alreadyMember) _log.Info("Existing member [{0}] is trying to join, ignoring", node); |
hmm scala does this to // check by address without uid to make sure that node with same host:port is not allowed
// to join until previous node with that host:port has been removed from the cluster
val alreadyMember = localMembers.exists(_.address == node.address)
val isUnreachable = !latestGossip.overview.reachability.isReachable(node)
if (alreadyMember) |
I assume the comment above is the important one. |
Exactly. It looks like the node isn't being removed properly. |
The only place stuff is removed from I'm not sure how the updated list of members are updated, but we should check if the downed member is really removed from the newMembers set |
Say we restart a service and the node is only down for a few seconds. That is were i see this sometimes. |
I'm on this one. We are doing a cluster bug hunt atm |
Running the nightly build in our cluster. This is fixed. The node will now be downed and allowed to rejoin. |
I have a node trying to rejoin the cluster. It was previously part of the cluster before being reset. Now when the node attempts to join:
Cluster Node [akka.tcp://[email protected]:8100] - Starting up...
It waits at this message. On the leader node:
Existing member [UniqueAddress: (akka.tcp://[email protected]:8100, 900595072)] is trying to join, ignoring
This message repeats over and over and the node is never added to the cluster.
The text was updated successfully, but these errors were encountered: