Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node(v4.3.1) crashed with "panic: runtime error: invalid memory address or nil pointer dereference" #3955

Closed
fish2plain opened this issue Dec 5, 2021 · 14 comments

Comments

@fish2plain
Copy link

Describe the bug

Harmony node with shard0 crashed few hours after upgraded to v4.3.1.

ubuntu@harmony-s0-lax:~$ ./harmony version
Harmony (C) 2020. harmony, version v7211-v4.3.1-0-g65614950 (runner@ 2021-11-27T05:27:53+0000)

To Reproduce
not reproducible so far

Expected behavior
node stable

Screenshots
stack trace:

Staking mode; node key 
...

 -> shard 0
Started RPC server at: 127.0.0.1:9500
Started Auth-RPC server at: 127.0.0.1:9501
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11c1625]

goroutine 6210888 [running]:
github.com/harmony-one/harmony/core/types.(*Block).Epoch(...)
        /home/runner/work/harmony/harmony/harmony/core/types/block.go:491
github.com/harmony-one/harmony/consensus.(*Consensus).sendCommitMessages(0xc000280780, 0x0)
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:168 +0x55
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared(0xc000280780, 0xc02ebb0380)
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:263 +0x534
github.com/harmony-one/harmony/consensus.(*Consensus).HandleMessageUpdate(0xc000280780, 0x20e4220, 0xc0280bafc0, 0xc0280baf60, 0xc0cae48f30, 0xc0001962d0, 0xc0001962d0)
        /home/runner/work/harmony/harmony/harmony/consensus/consensus_v2.go:112 +0x3a0
github.com/harmony-one/harmony/node.(*Node).StartPubSub.func2.1(0xc055a9c780, 0xc016b1c8c0, 0xc0280baf00, 0xc00021ac00, 0x20e4220, 0xc0280bafc0, 0x1, 0xc055a9c760, 0xc0280baf60, 0x0, ...)
        /home/runner/work/harmony/harmony/harmony/node/node.go:816 +0x4ae
created by github.com/harmony-one/harmony/node.(*Node).StartPubSub.func2
        /home/runner/work/harmony/harmony/harmony/node/node.go:803 +0x1d8

Environment (please complete the following information):

  • OS: NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)"
  • Go environment : not installed

Additional context

@fish2plain
Copy link
Author

fish2plain commented Dec 6, 2021

on a different shard0 node, gotten similar error but stack trace is on different line.

But I won't be trying test binary on this node. I ran the test binary on another node, and it fell behind ~10K blocks after restart.

Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]

goroutine 3724389 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc081bba000)
        /home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc000fae000, 0x0)
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591

@gsampathkumar
Copy link

Same as @fish2plain Ran the test binary for a few hours and it slowed to a crawl with the OUT OF SYNC messages and fell significantly behind.

@gsampathkumar
Copy link

Reverting to version 4.3.0 seems to have fixed the issue for now.

@sophoah
Copy link
Contributor

sophoah commented Dec 20, 2021

@gsampathkumar could you confirm the testnet binary version you tried ? The latest binary has now another commit that is helping with the sync speed. And just to confirm were you still experiencing the panic issue while using the testnet binary ?

@gsampathkumar
Copy link

@sophoah We did not encounter the panic issue using the testnet binary. Only the slow sync.

I will use the latest testnet binary on one of our nodes and test if the slow sync issue gets solved. Will keep this thread posted.

@gsampathkumar
Copy link

running one node with

root@HarmonySecondary:/mnt/volume_sfo3_03# ./harmony -V
Harmony (C) 2020. harmony, version v7214-v4.3.1-3-g4c9546a4 (jenkins@ 2021-12-19T13:56:15+0000)

Its currently caught up though, and not sure if it will exercise the sync path to test if that slow sync issue has been fixed. Let me know if I should let it fall behind for 1-2 hours and then have it try to catch up.

@sophoah
Copy link
Contributor

sophoah commented Dec 20, 2021

@gsampathkumar no need to force the unsync. I've installed the same code on most of our internal node today, eventually in January, this may become a new release.

@lcgogo
Copy link

lcgogo commented Dec 28, 2021

same issue

Started RPC server at: 0.0.0.0:62075
Started Auth-RPC server at: 0.0.0.0:9501
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]

goroutine 1901533 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc01934b750)
        /home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc000dd2000, 0x0)
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591

./harmony --version
Harmony (C) 2020. harmony, version v7211-v4.3.1-0-g65614950 (runner@ 2021-11-27T05:27:53+0000)

@lcgogo
Copy link

lcgogo commented Dec 30, 2021

@sophoah Met the issue again. Downgrade to 4.3.0 now.

@sophoah
Copy link
Contributor

sophoah commented Jan 4, 2022

@rlan35 any idea ? seems it happens to some node still, and on validator node, not only explorer node

@staking4all
Copy link

Hi

This issue happens at epoch change. I have all my nodes went off yesterday at epoch change. Happen while I was sleeping and woke up to a whole bunch of monitoring alerts. Been unelected due to this bug.

Did it again today to all nodes again at epoch change over. Ensuring restart now on service so doesn't unelect me again.

Thanks

@OleFass
Copy link

OleFass commented Jan 8, 2022

I have a node running on Ubuntu 20.04. with default configs for 3 weeks. The hardware exceeds the requirements by a multiple. Nothing else runs on the server. Still the same issue happens to my node roughly once a day too:

Started RPC server at: 127.0.0.1:9500
Started Auth-RPC server at: 127.0.0.1:9501
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]

goroutine 4070621 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc00e804f60)
        /home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc00013e500, 0x0)
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
        /home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591

Downgraded to 4.3.0 for now.

@staking4all
Copy link

Hi

Just an update

Node kept giving same error at epoch change over.

So switched all my nodes to the testnet version. Since then no more crashes.

Thanks.

@zmyya
Copy link

zmyya commented Oct 10, 2023

I faced the same problem again version v8126-v2023.2.7-0-g1b9614ba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants