Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use postgres protocol for postgres <-> walkeeper communication #366

Merged
merged 1 commit into from
Aug 13, 2021

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Jul 27, 2021

This PR will encompass the first of two changes for #117.

Note: This is a work-in-progress; the postgres-side hasn't been updated yet, and so it is expected to not work yet.


Notes on messages:

Because we're now using CopyData messages, there are now distinct boundaries for messages where there weren't before. What this means is that it's worth clarifying where those boundaries are -- here for now, and then officially within walkeeper/README_PROTO.md or somewhere else when they're decided upon.

The messages are, from walkeeper's perspective:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info                      (type `SafeKeeperRequest`)
         (break loop if done)
    <- recv: wal block                 (raw bytes)
    -> send: confirm receipt           (type `SafeKeeperResponse`)

@sharnoff
Copy link
Member Author

sharnoff commented Jul 29, 2021

I've now added the walproposer modifications in postgres -- that's sitting in neondatabase/postgres#60. Still untested.

EDIT: It built fine on my machine - I'll have to figure out what it's missing in CI.

Meanwhile... here's a summary of what's new :)

Things making this implementation more complicated than before

Changing the protocol to work over libpq fundamentally only directly required a few changes:

  • Change connecting to work over PGconnectStart instead of directly connecting on the socket
  • Swapping out all of the places where we were previously using Read/WriteSocketAsync to something that retrieves the same information, but over CopyData messages in the postgres protocol.
  • Adding graceful shutdown of the PGconn object used for managing the connection

The extra work comes in because the operations now had many different ways they could be asynchronous -- instead of just using asyncOff inside the WalKeeper to track our writing position (and just continuing to call WriteSocketAsync as needed), there's now a number of different polling functions that are required to advance different operations:

  • PQconnectPoll for connecting to the walkeeper,
  • PQflush to finish writing a message (sometimes preceeded by PQconsumeInput)
  • and PQconsumeInput by itself for reads, followed by retrying the read operation.

Basically, the logic is significantly more complicated.

The new implementation

To handle the complexity from above, there's now more pieces to a state! Instead of just the previous SS_* states that were being used, they're now paired with two additional pieces of information:

  • a WalKeeperPollState, which indicates which polling operation to perform in order to advance the state; and
  • a WKSockWaitKind, which indicates the type of socket event the state is waiting on

So the new polling function is then roughly:

def AdvancePollState(walkeeper, events):
    while "first iteration" or walkeeper.pollState == "no waiting required":
        AssertEventsMatchesExpected(events, walkeeper.sockWaitState)

        # The first big switch statement
        if requiresIOpolling(walkeeper.pollState):
            DoPollingLogic(walkeeper)

            if "still requires IO polling":
                return

        # Labeled as "ExecuteNextProtocolState"
        #
        # Also internally a big switch statement, handling the "starting" logic for every state.
        DoStateLogic(walkeeper)
        events = None # only one event actually occured

The DoStateLogic pseudocode corresponds more-or-less to what the internals of WalKeeperPoll previously had. The main distinction here though, is that the state logic is all nonblocking -- the state's logic only really corresponds to the portion that we can do without any IO. (In practice, reading states have the structure "IO Read -> logic -> go to writing state", and writing states have the structure "logic -> start write")

Further improvements / Open questions

There's currently what feels like a lot of repetition inside AdvancePollState, particularly from where we're using AsyncPGWrite. I'm not currently sure about the best way to factor that out though, because there's often different control flow required for the different results.

Also: assuming that the structure of AdvancePollState is somewhat agreeable, I'm planning on adding a comment that details some of the typical control flow through it when handling reads or writes -- because I don't think it's currently very intuitive, and I'd like for anyone that comes across the code in the future to have an easier time understanding what exactly's going on.

@kelvich
Copy link
Contributor

kelvich commented Aug 10, 2021

Woo-hoo! Congrats with passing current tests suite. I tried to run script https://gist.github.com/kelvich/706757e001012bab0605c01e1b5ea1c6 on this branch and got same issues that were fixed few days ago in the main branch. Which is expected, given that this branch (and corresponding postgres branch) was not rebased on the main.

I'm +1 with committing this in it's current form after rebase and address refactoring in follow up PR's.

@sharnoff sharnoff marked this pull request as ready for review August 12, 2021 21:32
@sharnoff
Copy link
Member Author

sharnoff commented Aug 12, 2021

The remaining failures are on the following tests:

As far as I can tell, these are unrelated to this PR -- @kelvich, if these look like the same errors to you, perhaps this is ready to be merged anyways?


Edit: Looks like CI re-ran when I changed this PR to no longer be a draft, and the errors happened to not occur this time.

@arssher
Copy link
Contributor

arssher commented Aug 13, 2021

Yes, I think it can be merged after squashing commits. Tests are also ok here. Just to be clear, I still strongly feel C part ought to be significantly refactored to be more simple, but let's merge for the sake of working on #315 and other things.

@sharnoff
Copy link
Member Author

Yes, I think it can be merged after squashing commits. Tests are also ok here. Just to be clear, I still strongly feel C part ought to be significantly refactored to be more simple, but let's merge for the sake of working on #315 and other things.

@arssher 100% agree on all of this.

Most of the work here was done on the postgres side. There's more
information in the commit message there.
 (see: neondatabase/postgres@04cfa32)

On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to
initialize the WAL keeper protocol. Everything else is mostly the same,
with the only real difference being that protocol messages are now
discrete CopyData messages sent over the postgres protocol.

For the sake of documentation, the full set of these messages is:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info and maybe WAL block  (type `SafeKeeperRequest` + bytes)
         (break loop if done)
    -> send: confirm receipt           (type `SafeKeeperResponse`)
@sharnoff sharnoff force-pushed the walkeeper-over-libpq branch from eddcb09 to ccb099d Compare August 13, 2021 16:19
@sharnoff
Copy link
Member Author

Single failure in CI - this seems unrelated to the PR, but I'm not sure. If it is related, there's something to do with the change that's causing a WAL acceptor to have exited earlier than expected.

=================================== FAILURES ===================================
________________________________ test_restarts _________________________________
batch_others/test_wal_acceptor.py:88: in test_restarts
    failed_node.stop()
fixtures/zenith_fixtures.py:550: in stop
    pid = read_pid(pidfile_path)
fixtures/zenith_fixtures.py:516: in read_pid
    return int(Path(path).read_text())
E   ValueError: invalid literal for int() with base 10: ''

@sharnoff
Copy link
Member Author

sharnoff commented Aug 13, 2021

Actually this seems to just be the same issue that #417 is addressing

Edit to add: in light of this, going to go ahead with merging.

@sharnoff sharnoff merged commit 5eb1738 into main Aug 13, 2021
@hlinnaka
Copy link
Contributor

hlinnaka commented Aug 13, 2021 via email

@stepashka stepashka deleted the walkeeper-over-libpq branch January 11, 2022 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants