Use postgres protocol for `postgres <-> walkeeper` communication #366

sharnoff · 2021-07-27T00:31:48Z

This PR will encompass the first of two changes for #117.

Note: This is a work-in-progress; the postgres-side hasn't been updated yet, and so it is expected to not work yet.

Notes on messages:

Because we're now using CopyData messages, there are now distinct boundaries for messages where there weren't before. What this means is that it's worth clarifying where those boundaries are -- here for now, and then officially within walkeeper/README_PROTO.md or somewhere else when they're decided upon.

The messages are, from walkeeper's perspective:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info                      (type `SafeKeeperRequest`)
         (break loop if done)
    <- recv: wal block                 (raw bytes)
    -> send: confirm receipt           (type `SafeKeeperResponse`)

walkeeper/src/wal_service.rs

walkeeper/src/send_wal.rs

walkeeper/src/receive_wal.rs

sharnoff · 2021-07-29T05:36:25Z

I've now added the walproposer modifications in postgres -- that's sitting in neondatabase/postgres#60. Still untested.

EDIT: It built fine on my machine - I'll have to figure out what it's missing in CI.

Meanwhile... here's a summary of what's new :)

Things making this implementation more complicated than before

Changing the protocol to work over libpq fundamentally only directly required a few changes:

Change connecting to work over PGconnectStart instead of directly connecting on the socket
Swapping out all of the places where we were previously using Read/WriteSocketAsync to something that retrieves the same information, but over CopyData messages in the postgres protocol.
Adding graceful shutdown of the PGconn object used for managing the connection

The extra work comes in because the operations now had many different ways they could be asynchronous -- instead of just using asyncOff inside the WalKeeper to track our writing position (and just continuing to call WriteSocketAsync as needed), there's now a number of different polling functions that are required to advance different operations:

PQconnectPoll for connecting to the walkeeper,
PQflush to finish writing a message (sometimes preceeded by PQconsumeInput)
and PQconsumeInput by itself for reads, followed by retrying the read operation.

Basically, the logic is significantly more complicated.

The new implementation

To handle the complexity from above, there's now more pieces to a state! Instead of just the previous SS_* states that were being used, they're now paired with two additional pieces of information:

a WalKeeperPollState, which indicates which polling operation to perform in order to advance the state; and
a WKSockWaitKind, which indicates the type of socket event the state is waiting on

So the new polling function is then roughly:

def AdvancePollState(walkeeper, events):
    while "first iteration" or walkeeper.pollState == "no waiting required":
        AssertEventsMatchesExpected(events, walkeeper.sockWaitState)

        # The first big switch statement
        if requiresIOpolling(walkeeper.pollState):
            DoPollingLogic(walkeeper)

            if "still requires IO polling":
                return

        # Labeled as "ExecuteNextProtocolState"
        #
        # Also internally a big switch statement, handling the "starting" logic for every state.
        DoStateLogic(walkeeper)
        events = None # only one event actually occured

The DoStateLogic pseudocode corresponds more-or-less to what the internals of WalKeeperPoll previously had. The main distinction here though, is that the state logic is all nonblocking -- the state's logic only really corresponds to the portion that we can do without any IO. (In practice, reading states have the structure "IO Read -> logic -> go to writing state", and writing states have the structure "logic -> start write")

Further improvements / Open questions

There's currently what feels like a lot of repetition inside AdvancePollState, particularly from where we're using AsyncPGWrite. I'm not currently sure about the best way to factor that out though, because there's often different control flow required for the different results.

Also: assuming that the structure of AdvancePollState is somewhat agreeable, I'm planning on adding a comment that details some of the typical control flow through it when handling reads or writes -- because I don't think it's currently very intuitive, and I'd like for anyone that comes across the code in the future to have an easier time understanding what exactly's going on.

walkeeper/src/receive_wal.rs

kelvich · 2021-08-10T08:24:24Z

Woo-hoo! Congrats with passing current tests suite. I tried to run script https://gist.github.com/kelvich/706757e001012bab0605c01e1b5ea1c6 on this branch and got same issues that were fixed few days ago in the main branch. Which is expected, given that this branch (and corresponding postgres branch) was not rebased on the main.

I'm +1 with committing this in it's current form after rebase and address refactoring in follow up PR's.

sharnoff · 2021-08-12T21:39:58Z

The remaining failures are on the following tests:

test_tenants_normal_work[False] -- panics with 'initdb: error: could not create directory "./tenants/9a1873db8a1a46869a392af0a302ddf9": File exists', later has another panic that looks the same as could not send WAL redo request: "SendError(..)" #416
pg_regress hangs -- probably related to pg_regress sometimes hangs in CI #413

As far as I can tell, these are unrelated to this PR -- @kelvich, if these look like the same errors to you, perhaps this is ready to be merged anyways?

Edit: Looks like CI re-ran when I changed this PR to no longer be a draft, and the errors happened to not occur this time.

arssher · 2021-08-13T07:23:52Z

Yes, I think it can be merged after squashing commits. Tests are also ok here. Just to be clear, I still strongly feel C part ought to be significantly refactored to be more simple, but let's merge for the sake of working on #315 and other things.

sharnoff · 2021-08-13T16:15:27Z

Yes, I think it can be merged after squashing commits. Tests are also ok here. Just to be clear, I still strongly feel C part ought to be significantly refactored to be more simple, but let's merge for the sake of working on #315 and other things.

@arssher 100% agree on all of this.

Most of the work here was done on the postgres side. There's more information in the commit message there. (see: neondatabase/postgres@04cfa32) On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to initialize the WAL keeper protocol. Everything else is mostly the same, with the only real difference being that protocol messages are now discrete CopyData messages sent over the postgres protocol. For the sake of documentation, the full set of these messages is: <- recv: START_WAL_PUSH query <- recv: server info from postgres (type `ServerInfo`) -> send: walkeeper info (type `SafeKeeperInfo`) <- recv: vote info (type `RequestVote`) if node id mismatch: -> send: self node id (type `NodeId`); exit -> send: confirm vote (with node id) (type `NodeId`) loop: <- recv: info and maybe WAL block (type `SafeKeeperRequest` + bytes) (break loop if done) -> send: confirm receipt (type `SafeKeeperResponse`)

sharnoff · 2021-08-13T16:53:16Z

Single failure in CI - this seems unrelated to the PR, but I'm not sure. If it is related, there's something to do with the change that's causing a WAL acceptor to have exited earlier than expected.

=================================== FAILURES ===================================
________________________________ test_restarts _________________________________
batch_others/test_wal_acceptor.py:88: in test_restarts
    failed_node.stop()
fixtures/zenith_fixtures.py:550: in stop
    pid = read_pid(pidfile_path)
fixtures/zenith_fixtures.py:516: in read_pid
    return int(Path(path).read_text())
E   ValueError: invalid literal for int() with base 10: ''

sharnoff · 2021-08-13T17:38:24Z

Actually this seems to just be the same issue that #417 is addressing

Edit to add: in light of this, going to go ahead with merging.

hlinnaka · 2021-08-13T18:33:24Z

On 13 August 2021 19:53:28 EEST, Max Sharnoff ***@***.***> wrote: Single failure in CI - this *seems* unrelated to the PR, but I'm not sure. If it is related, there's something to do with the change that's causing a WAL acceptor to have exited earlier than expected. ``` =================================== FAILURES =================================== ________________________________ test_restarts _________________________________ batch_others/test_wal_acceptor.py:88: in test_restarts failed_node.stop() fixtures/zenith_fixtures.py:550: in stop pid = read_pid(pidfile_path) fixtures/zenith_fixtures.py:516: in read_pid return int(Path(path).read_text()) E ValueError: invalid literal for int() with base 10: '' ```

Sounds like #417 - Heikki

kelvich reviewed Jul 27, 2021

View reviewed changes

walkeeper/src/wal_service.rs Show resolved Hide resolved

walkeeper/src/send_wal.rs Show resolved Hide resolved

patins reviewed Jul 28, 2021

View reviewed changes

walkeeper/src/receive_wal.rs Outdated Show resolved Hide resolved

walkeeper/src/receive_wal.rs Show resolved Hide resolved

sharnoff mentioned this pull request Jul 29, 2021

Change walproposer protocol to run on libpq neondatabase/postgres#60

Merged

arssher reviewed Aug 5, 2021

View reviewed changes

walkeeper/src/receive_wal.rs Outdated Show resolved Hide resolved

arssher mentioned this pull request Aug 9, 2021

Refactor safekeeper after model checking #315

Closed

4 tasks

arssher mentioned this pull request Aug 12, 2021

CI failure in test_restart_compute test #409

Closed

sharnoff marked this pull request as ready for review August 12, 2021 21:32

sharnoff force-pushed the walkeeper-over-libpq branch from eddcb09 to ccb099d Compare August 13, 2021 16:19

sharnoff merged commit 5eb1738 into main Aug 13, 2021

stepashka deleted the walkeeper-over-libpq branch January 11, 2022 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use postgres protocol for `postgres <-> walkeeper` communication #366

Use postgres protocol for `postgres <-> walkeeper` communication #366

sharnoff commented Jul 27, 2021 •

edited

Loading

sharnoff commented Jul 29, 2021 •

edited

Loading

kelvich commented Aug 10, 2021

sharnoff commented Aug 12, 2021 •

edited

Loading

arssher commented Aug 13, 2021

sharnoff commented Aug 13, 2021

sharnoff commented Aug 13, 2021

sharnoff commented Aug 13, 2021 •

edited

Loading

hlinnaka commented Aug 13, 2021 via email

Use postgres protocol for postgres <-> walkeeper communication #366

Use postgres protocol for postgres <-> walkeeper communication #366

Conversation

sharnoff commented Jul 27, 2021 • edited Loading

sharnoff commented Jul 29, 2021 • edited Loading

Things making this implementation more complicated than before

The new implementation

Further improvements / Open questions

kelvich commented Aug 10, 2021

sharnoff commented Aug 12, 2021 • edited Loading

arssher commented Aug 13, 2021

sharnoff commented Aug 13, 2021

sharnoff commented Aug 13, 2021

sharnoff commented Aug 13, 2021 • edited Loading

hlinnaka commented Aug 13, 2021 via email

Use postgres protocol for `postgres <-> walkeeper` communication #366

Use postgres protocol for `postgres <-> walkeeper` communication #366

sharnoff commented Jul 27, 2021 •

edited

Loading

sharnoff commented Jul 29, 2021 •

edited

Loading

sharnoff commented Aug 12, 2021 •

edited

Loading

sharnoff commented Aug 13, 2021 •

edited

Loading