Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster rejoin behavior #918

Closed
annymsMthd opened this issue Apr 28, 2015 · 11 comments
Closed

Cluster rejoin behavior #918

annymsMthd opened this issue Apr 28, 2015 · 11 comments

Comments

@annymsMthd
Copy link
Contributor

I have a node trying to rejoin the cluster. It was previously part of the cluster before being reset. Now when the node attempts to join:

Cluster Node [akka.tcp://[email protected]:8100] - Starting up...

It waits at this message. On the leader node:

Existing member [UniqueAddress: (akka.tcp://[email protected]:8100, 900595072)] is trying to join, ignoring

This message repeats over and over and the node is never added to the cluster.

@annymsMthd
Copy link
Contributor Author

@Aaronontheweb @smalldave

@annymsMthd
Copy link
Contributor Author

Existing member [UniqueAddress: (akka.tcp://[email protected]:8100, 1641121334)] is trying to join, ignoring

When i shut the node down and try again. Looks like the id changed but the node is still being ignored

@Aaronontheweb
Copy link
Member

Possibly related to #774

@rogeralsing
Copy link
Contributor

 // check by address without uid to make sure that node with same host:port is not allowed
 // to join until previous node with that host:port has been removed from the cluster

// ^ WAT? checking w/o UID

 var alreadyMember = localMembers.Any(m => m.Address == node.Address);
 var isUnreachable = !_latestGossip.Overview.Reachability.IsReachable(node);

 if (alreadyMember) _log.Info("Existing member [{0}] is trying to join, ignoring", node);

@rogeralsing
Copy link
Contributor

hmm scala does this to

      // check by address without uid to make sure that node with same host:port is not allowed
      // to join until previous node with that host:port has been removed from the cluster
      val alreadyMember = localMembers.exists(_.address == node.address)
      val isUnreachable = !latestGossip.overview.reachability.isReachable(node)

      if (alreadyMember)

@rogeralsing
Copy link
Contributor

// to join until previous node with that host:port has been removed from the cluster

I assume the comment above is the important one.
the node needs to be removed

@annymsMthd
Copy link
Contributor Author

Exactly. It looks like the node isn't being removed properly.

@rogeralsing
Copy link
Contributor

The only place stuff is removed from localMembers is in Downing(Address address).
Either that code is b0rked, or the Member equality ops are.

I'm not sure how the updated list of members are updated, but we should check if the downed member is really removed from the newMembers set

@annymsMthd
Copy link
Contributor Author

Say we restart a service and the node is only down for a few seconds. That is were i see this sometimes.

@annymsMthd
Copy link
Contributor Author

I'm on this one. We are doing a cluster bug hunt atm

@annymsMthd
Copy link
Contributor Author

Running the nightly build in our cluster. This is fixed. The node will now be downed and allowed to rejoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants