geth not connecting to bootnodes on private test network #1685

ckeenan · 2015-08-19T04:19:52Z

Hi I have noticed that for geth on a private net, setting bootnodes works fine at first but eventually (a few hours later) setting bootnodes will not do anything. Peers added manually can sync up fine though.

I see this for bootnodes running on both AWS and DO instances with same issues? Perhaps its a byproduct of a network that is too small?

radster360 · 2015-08-19T04:42:41Z

We are running our private net on GCE and I too have similar issues. I am running geth 1.0.1. My private net was running fine and I just checked. My mining node is at different block and my other non-mining node has stopped syncing. It ran for few days without issue.

After killing the nodes and restarting them the block synchronization has restarted and now both nodes are synced up.

fjl · 2015-08-19T19:43:40Z

@ckeenan Please explain what you mean by "setting bootnodes". It is important to note that geth does not maintain a persistent connection to the bootstrap nodes. They are only used as the first point of contact for the node discovery protocol.

ckeenan · 2015-08-19T20:13:38Z

@fjl geth --bootnodes="enode://..." .... Discovery indeed goes fine when shutting down & starting that client on occasion (testing out the problem) for the first hour or so with the bootnodes parameter set. Eventually a shutdown and startup will not connect to the bootnode anymore even though the parameter is set. Any ideas?

aakilfernandes · 2016-02-17T23:51:36Z

Having this same problem until I add the peer via admin.addPeer. Here's what I'm putting into geth

geth --port 30304 --rpc --rpcaddr 127.0.0.1 --rpcport 8101 --rpccorsdomain http://127.0.0.1:8000 --genesis config/development/genesis.json --datadir datadir/development/ --networkid 5473 --bootnodes=“enode://3ede10cd6a5e382db43804f3267c5a6eab6021e245b6eb28d1d9b11720638490e876ce5e61bd76992befa89935c5a34f139df7dda82b00d5ea09da6f96f77839@40.117.36.75:30303" --unlock 42c35b6c220d570bd8898a540406e0b026479f7b --password config/development/password console

I'm able to work around it with

admin.addPeer('enode://3ede10cd6a5e382db43804f3267c5a6eab6021e245b6eb28d1d9b11720638490e876ce5e61bd76992befa89935c5a34f139df7dda82b00d5ea09da6f96f77839@40.117.36.75:30303')

aakilfernandes · 2016-02-17T23:52:28Z

Geth
Version: 1.3.3
Protocol Versions: [63 62 61]
Network Id: 1
Go Version: go1.5.3
OS: darwin
GOPATH=
GOROOT=/usr/local/Cellar/go/1.5.3/libexec

ckeenan · 2016-02-18T01:21:29Z

@aakilfernandes the clock on one of the nodes could be out of sync. Try sudo ntpdate -s time.nist.gov. Or see the Frontier Guide Connecting to the Network: Common Problems With Connectivity

aakilfernandes · 2016-02-18T15:55:29Z

@ckeenan that was it. Thanks!

aakilfernandes · 2016-02-18T18:12:21Z

Actually @ckeenan, all interested, looks like the problem is back unfortunately. I think the issue is that bootnodes aren't automatically used as peers. when the problem went away, there was a 3rd node in the mix. So I think what's happening is geth searches the bootnodes for peers. If the bootnode doesn't have any peers, it doesn't connect. However geth should use the bootnode itself as a peer if available.

yxliang01 · 2016-12-27T15:49:01Z

Just a random thought. It does this for avoiding bootnode (potentially to be important public well-known node) being overload?

immartian · 2017-01-23T14:21:11Z

I noticed some details from setting verbosity=6, that one bootnodes won't specifically working for a private network, instead, it works for all different network including testnet and main chain. So eventually private nodes won't be able to seek each other via this bootnode.

karalabe · 2017-04-18T14:05:35Z

Geth 1.6 has a feature that if the node cannot find any good peers for 30 seconds, it will try to connect to the bootnode itself. This should help short term. Long term we need to enable discovery v5 on the eth protocol for test/private networks, which should solve the issue properly.

yxliang01 · 2017-04-19T03:09:20Z

@karalabe Well, the problem was that, it just doesn't connect even after 30 seconds.
@immartian Well, are you talking about the official bootnodes? I have this problem with my own nodes as bootnodes.

galladivya · 2017-11-07T07:54:34Z

I can't connect to any peer using --bootnodes flag and I also used static-nodes.json file.No use it returns 0 when I used net.peerCount

pablochacin · 2018-01-27T11:17:51Z

I'm having the same problem. Except that the first peer could actually connect to the boot node (both see each other as peers). I'm testing using docker, but I don't think this would add any additional problem to the mix.

Can someone please confirm this is an issue or a "feature" (that is,a well known issue that for some reason is the expected behavior). Thanks in advance.

karalabe · 2018-01-27T11:21:11Z

@pablochacin Please describe your exact setup. We don't see any issues on either mainnet or testnets (which are for all intents and purposes "private" networks).

pablochacin · 2018-01-27T11:54:30Z

@karalabe Never mind. I left it running some time (> 30min) and eventually the peers found each other. Now I need to figure out how to be sure this has happened before I test anything (like whisper messaging). Any suggestion here would be appreciated.

As a side note, to my surprise, I found an external node (with public ip) also listed as peer. How could this happen? I'm using network id 15.

172.17.0.2:50294 <- docker container (bootnode)
178.238.233.123:30303 <- external node
172.17.0.4:59714 <- docker container (peer)
172.17.0.5:30303 <- docker container (peer)

pablochacin · 2018-01-27T12:06:59Z

--nodiscover
Use this to make sure that your node is not discoverable by people who do not manually add you. Otherwise, there is a chance that your node may be inadvertently added to a stranger's node if they have the same genesis file and network id

looks like adding nodes manually is the best way to go, both to ensure they are deterministically added and to prevent the issue of spooky nodes.
.

zscole · 2018-03-28T19:40:09Z

@pablochacin Unless you add the --nodiscover flag when you launch your geth instance, any other node using the same genesis file and network ID can peer with the nodes on your private testnet. Since 15 isn't a large, random number, I'd say that there are some folks out there that are probably using that network ID and likely have the same settings configured in their custom genesis file.

Edit: Sorry, didn't see your last comment, but yeah. XD

* Fix deadlock during StartValidating StartValidating makes a call to RefreshValPeers while holding coreMu and RefreshValPeers waits for all validator peers to be deleted and then reconnects to known validators. If any of those peers has called IsValidating before RefreshValPeers tries to delete them, the system gets stuck in a deadlock because IsValidating also tries to acquire coreMu. The peer will never acquire coreMu because it is held by StartValidating, and StartValidating will never return because it is waiting for all peers to disconnect. This commit makes coreStarted into an atomic variable so that peers can make threadsafe calls to IsValidating without needing to acquire coreStarted. * Fix long wait for nodes to connect At test startup sometimes nodes were taking in the region of 30s to connect whilst other times it was happening in μs. The problem was we were trying to connect all peers to all other peers. That meant that for any two peers they would both dial each other. Sometimes if this occurred close enough in time both sides would hang up the connections (I call this cross dialing). This happens because each side counts their outgoing connection as connected and then when the incoming connection arrives they drop it because they see themselves as already connected. When this happened nodes would retry after some time probably 30s and then be connected. The fix was to ensure that for any two nodes only one of them dials the other.

hickscorp · 2022-07-22T11:53:09Z

This could be related to #23210

What I noticed was that specifying either / or / and BootstrapNodes, StaticNodes, TrustedNodes is a potential issue in private settings - yes.
If we specify, say, 3 enode addresses in either / or / all of these parameters, geth will fail to start completely rather than retrying later.
It's really something that should IMO be addressed - because it kind of makes it impossible to have a self-healing network. For example, if any of the specified nodes is down, then none of the other nodes will be able to restart if they also go down. They just crash on startup.
In that other Github issue, I'm suggesting to make this a graceful warning rather than a fatal error... But had zero response.

obscuren added the * label Sep 23, 2015

karalabe closed this as completed Apr 18, 2017

vikeychen mentioned this issue May 17, 2018

Bootnode receive too many nodes #16745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geth not connecting to bootnodes on private test network #1685

geth not connecting to bootnodes on private test network #1685

ckeenan commented Aug 19, 2015

radster360 commented Aug 19, 2015

fjl commented Aug 19, 2015

ckeenan commented Aug 19, 2015

aakilfernandes commented Feb 17, 2016

aakilfernandes commented Feb 17, 2016

ckeenan commented Feb 18, 2016

aakilfernandes commented Feb 18, 2016

aakilfernandes commented Feb 18, 2016

yxliang01 commented Dec 27, 2016

immartian commented Jan 23, 2017

karalabe commented Apr 18, 2017

yxliang01 commented Apr 19, 2017

galladivya commented Nov 7, 2017

pablochacin commented Jan 27, 2018 •

edited

Loading

karalabe commented Jan 27, 2018

pablochacin commented Jan 27, 2018

pablochacin commented Jan 27, 2018 •

edited

Loading

zscole commented Mar 28, 2018 •

edited

Loading

hickscorp commented Jul 22, 2022

geth not connecting to bootnodes on private test network #1685

geth not connecting to bootnodes on private test network #1685

Comments

ckeenan commented Aug 19, 2015

radster360 commented Aug 19, 2015

fjl commented Aug 19, 2015

ckeenan commented Aug 19, 2015

aakilfernandes commented Feb 17, 2016

aakilfernandes commented Feb 17, 2016

ckeenan commented Feb 18, 2016

aakilfernandes commented Feb 18, 2016

aakilfernandes commented Feb 18, 2016

yxliang01 commented Dec 27, 2016

immartian commented Jan 23, 2017

karalabe commented Apr 18, 2017

yxliang01 commented Apr 19, 2017

galladivya commented Nov 7, 2017

pablochacin commented Jan 27, 2018 • edited Loading

karalabe commented Jan 27, 2018

pablochacin commented Jan 27, 2018

pablochacin commented Jan 27, 2018 • edited Loading

zscole commented Mar 28, 2018 • edited Loading

hickscorp commented Jul 22, 2022

pablochacin commented Jan 27, 2018 •

edited

Loading

pablochacin commented Jan 27, 2018 •

edited

Loading

zscole commented Mar 28, 2018 •

edited

Loading