-
Notifications
You must be signed in to change notification settings - Fork 20.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geth not connecting to bootnodes on private test network #1685
Comments
We are running our private net on GCE and I too have similar issues. I am running geth 1.0.1. My private net was running fine and I just checked. My mining node is at different block and my other non-mining node has stopped syncing. It ran for few days without issue. After killing the nodes and restarting them the block synchronization has restarted and now both nodes are synced up. |
@ckeenan Please explain what you mean by "setting bootnodes". It is important to note that |
@fjl |
Having this same problem until I add the peer via
I'm able to work around it with admin.addPeer('enode://3ede10cd6a5e382db43804f3267c5a6eab6021e245b6eb28d1d9b11720638490e876ce5e61bd76992befa89935c5a34f139df7dda82b00d5ea09da6f96f77839@40.117.36.75:30303') |
Geth |
@aakilfernandes the clock on one of the nodes could be out of sync. Try |
@ckeenan that was it. Thanks! |
Actually @ckeenan, all interested, looks like the problem is back unfortunately. I think the issue is that bootnodes aren't automatically used as peers. when the problem went away, there was a 3rd node in the mix. So I think what's happening is geth searches the bootnodes for peers. If the bootnode doesn't have any peers, it doesn't connect. However geth should use the bootnode itself as a peer if available. |
Just a random thought. It does this for avoiding bootnode (potentially to be important public well-known node) being overload? |
I noticed some details from setting verbosity=6, that one bootnodes won't specifically working for a private network, instead, it works for all different network including testnet and main chain. So eventually private nodes won't be able to seek each other via this bootnode. |
Geth 1.6 has a feature that if the node cannot find any good peers for 30 seconds, it will try to connect to the bootnode itself. This should help short term. Long term we need to enable discovery v5 on the |
@karalabe Well, the problem was that, it just doesn't connect even after 30 seconds. |
I can't connect to any peer using --bootnodes flag and I also used static-nodes.json file.No use it returns 0 when I used net.peerCount |
I'm having the same problem. Except that the first peer could actually connect to the boot node (both see each other as peers). I'm testing using docker, but I don't think this would add any additional problem to the mix. Can someone please confirm this is an issue or a "feature" (that is,a well known issue that for some reason is the expected behavior). Thanks in advance. |
@pablochacin Please describe your exact setup. We don't see any issues on either mainnet or testnets (which are for all intents and purposes "private" networks). |
@karalabe Never mind. I left it running some time (> 30min) and eventually the peers found each other. Now I need to figure out how to be sure this has happened before I test anything (like whisper messaging). Any suggestion here would be appreciated. As a side note, to my surprise, I found an external node (with public ip) also listed as peer. How could this happen? I'm using network id 15.
|
looks like adding nodes manually is the best way to go, both to ensure they are deterministically added and to prevent the issue of spooky nodes. |
@pablochacin Unless you add the --nodiscover flag when you launch your geth instance, any other node using the same genesis file and network ID can peer with the nodes on your private testnet. Since 15 isn't a large, random number, I'd say that there are some folks out there that are probably using that network ID and likely have the same settings configured in their custom genesis file. Edit: Sorry, didn't see your last comment, but yeah. XD |
* Fix deadlock during StartValidating StartValidating makes a call to RefreshValPeers while holding coreMu and RefreshValPeers waits for all validator peers to be deleted and then reconnects to known validators. If any of those peers has called IsValidating before RefreshValPeers tries to delete them, the system gets stuck in a deadlock because IsValidating also tries to acquire coreMu. The peer will never acquire coreMu because it is held by StartValidating, and StartValidating will never return because it is waiting for all peers to disconnect. This commit makes coreStarted into an atomic variable so that peers can make threadsafe calls to IsValidating without needing to acquire coreStarted. * Fix long wait for nodes to connect At test startup sometimes nodes were taking in the region of 30s to connect whilst other times it was happening in μs. The problem was we were trying to connect all peers to all other peers. That meant that for any two peers they would both dial each other. Sometimes if this occurred close enough in time both sides would hang up the connections (I call this cross dialing). This happens because each side counts their outgoing connection as connected and then when the incoming connection arrives they drop it because they see themselves as already connected. When this happened nodes would retry after some time probably 30s and then be connected. The fix was to ensure that for any two nodes only one of them dials the other.
This could be related to #23210 What I noticed was that specifying either / or / and |
Hi I have noticed that for geth on a private net, setting bootnodes works fine at first but eventually (a few hours later) setting bootnodes will not do anything. Peers added manually can sync up fine though.
I see this for bootnodes running on both AWS and DO instances with same issues? Perhaps its a byproduct of a network that is too small?
The text was updated successfully, but these errors were encountered: