try to be more intelligent in WalAcceptor.stop #417

LizardWizzard · 2021-08-12T11:55:45Z

Original error reported by @hlinnaka:

=================================== FAILURES ===================================
________________________________ test_restarts _________________________________
batch_others/test_wal_acceptor.py:88: in test_restarts
    failed_node.stop()
fixtures/zenith_fixtures.py:550: in stop
    pid = read_pid(pidfile_path)
fixtures/zenith_fixtures.py:516: in read_pid
    return int(Path(path).read_text())
E   ValueError: invalid literal for int() with base 10: ''

This can occur when pidfile exists but actual pid value is not yet written to it. This PR attempts to detect this error, wait for half a second and try to read pid value again. I haven't been able to reproduce it locally despite running this test for ~120 times.

Also, I've added a bunch of typing sugar to wal acceptor fixtures

arssher · 2021-08-12T13:28:08Z

If this is indeed a problem as it seems to be, I'd better wait in WalAcceptor->start for the pid file actually become valid (and remove it before the start if it exists -- this doesn't protect us from concurrent start, but should be enough for test purposes).

Though probability of it on not-CPU-hungry machine must be really small -- it means safekeeper couldn't fill pid while test driver completely performed INSERT.

LizardWizzard · 2021-08-12T16:17:05Z

@arssher Maybe even better solution is to use IDENTIFY_SYSTEM query to wal acceptor as a readiness criteria, what do you think? So in wal acceptor start, wait for connection and IDENTIFY_SYSTEM query to succeed before going further

arssher · 2021-08-12T17:13:59Z

Such approach is an option, but 1) we don't necessarily know ztimelineid on the moment of acceptor start 2) tests don't actually need "ready" acceptor, so sounds more complicated than needed, but possible.

LizardWizzard · 2021-08-13T09:55:25Z

I decided to go with pid checking, general health checking mechanism is a good thing on its own, but it is not needed yet

arssher · 2021-08-13T19:45:19Z

Without removing pid file before start we risk going through the same race (reaching stop() before pid was written to the file), try to stop safekeeper using obsolete pid and then leaving it to prowl around. Starting another one later will fail to lock the control file.

Ideally safekeeper should cleanup pidfile on its own, but while we don't have this please add the line removing the pidfile in start and feel free to merge this.

…ing sugar to wal acceptor fixtures

LizardWizzard requested review from hlinnaka and arssher August 12, 2021 11:55

LizardWizzard force-pushed the feature/heal-test-restarts branch from fc10714 to 4470780 Compare August 13, 2021 09:47

LizardWizzard force-pushed the feature/heal-test-restarts branch from 4470780 to 73f9fa9 Compare August 13, 2021 10:13

sharnoff mentioned this pull request Aug 13, 2021

Use postgres protocol for postgres <-> walkeeper communication #366

Merged

arssher approved these changes Aug 13, 2021

View reviewed changes

LizardWizzard force-pushed the feature/heal-test-restarts branch from 73f9fa9 to c4e93fc Compare August 16, 2021 10:11

try to be more intelligent in WalAcceptor.start, added a bunch of typ…

27079fe

…ing sugar to wal acceptor fixtures

LizardWizzard force-pushed the feature/heal-test-restarts branch from c4e93fc to 27079fe Compare August 16, 2021 11:02

LizardWizzard merged commit 0c4ab80 into main Aug 16, 2021

LizardWizzard deleted the feature/heal-test-restarts branch August 16, 2021 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try to be more intelligent in WalAcceptor.stop #417

try to be more intelligent in WalAcceptor.stop #417

LizardWizzard commented Aug 12, 2021

arssher commented Aug 12, 2021

LizardWizzard commented Aug 12, 2021

arssher commented Aug 12, 2021

LizardWizzard commented Aug 13, 2021

arssher commented Aug 13, 2021

try to be more intelligent in WalAcceptor.stop #417

try to be more intelligent in WalAcceptor.stop #417

Conversation

LizardWizzard commented Aug 12, 2021

arssher commented Aug 12, 2021

LizardWizzard commented Aug 12, 2021

arssher commented Aug 12, 2021

LizardWizzard commented Aug 13, 2021

arssher commented Aug 13, 2021