Avoid callmemaybe #117

kelvich · 2021-05-10T11:30:32Z

Right now postgres and safekeeper need to call pageserver command "callmemaybe" to ask for incoming WAL streaming connection. Also there is a postgres <-> safekeeper connection that can be initiated without any callbacks. We can switch to that protocol both in postgres <-> pageserver direct connection and in safekeeper <-> pageserver connection.

sharnoff · 2021-07-21T23:10:08Z

Notes from conversations with @kelvich:

End goal

The full goal of this issue comes in two parts. The first is to replace the existing communication between postgres and safekeeper so that it's built entirely on libpq, using a new libpq command for safekeeper that postgres calls: START_WAL_PUSH. The second piece is to replace the existing "callmemaybe" calls between the pageserver and safekeeper/postgres with a similar or near-identical solution, with the same sort of command.

Current implementation

(Note: I may have some pieces of this somewhat wrong -- please let me know!)

The current communications between safekeeper and postgres are done outside of the postgres protocol, directly over TCP with a custom protocol. The safekeeper initiates the communication as a request for postgres to forward the WAL to it, but safekeeper must ask.

There's a similar mechanism at play in safekeeper <-> pageserver and postgres <-> pageserver connections -- I don't yet know enough about that to precisely say what distinguishes it from the postgres <-> safekeeper connection though.

The protocol in use is essentially just sending along the WAL, but there's a few extra pieces of information that it gets tagged with for handling elections, so we can't just directly use the relevant functionality from the postgres protocol.

Desired implementation

What we want to do is to switch over the existing custom protocol (built on top of TCP), and fit that to work over CopyData messages within the postgres protocol. The other major change is to remove the current implementation's organization around callbacks, so that postgres proactively sends the WAL to whichever system is expecting it, and safekeeper proactively does the same to send to the pageserver.

Reasoning

Briefly: "callmemaybe" carries unnecessary complexity -- especially because it's currently implemented outside of the postgres protocol. Implementing the functionality as "pushing" the WAL instead of "pulling" would be simpler, and building it on top of the postgres protocol allows us to take full advantage of the systems already provided there (like authentication).

kelvich · 2021-07-22T12:16:45Z

Thanks for writing this up.

small fix:

The safekeeper initiates the communication as a request for postgres to forward the WAL to it, but safekeeper must ask.

(from 'Current implementation')

Postgres initiates communication with safekeeper. That is opposed to how that happens with pg<->pageserver and safekeeper<->pageserver where initiating party should first connect and run 'callmemaybe' and after that receiving party would open connection to the initiator and start pulling WAL.

kelvich · 2021-07-22T12:29:36Z

May be this can help

Step 1: change WAL push from custom tcp proto to libpq
Step 2: change callmemaybe which is libpq to WAL push and data directly in current connection without any dial back connection

sharnoff · 2021-07-23T08:46:56Z

I spent some time earlier trying to understand how walproposer fits into the postgres codebase in order to know what actually needs to change -- the end result is the picture of a flowchart I drew up below.

For the most part, arrows correspond to function calls. Somtimes, little additional notes are included - like indicating that WalProposerMain just calls WalProposerPoll in an infinite loop. The two main entrypoints that I've focused on are double-boxed (PostmasterMain and PostgresMain), and all public exports from walproposer.c are underlined.

There's more stuff within walproposer that I didn't cover here, but I found the overall system view to be helpful enough by itself to polish up a little and add to the issue.

lubennikovaav · 2021-10-14T14:20:51Z

Hi, what is the current state of this task?
While working on pageserver shutdown, I bumped into the problem with the hanging connection between walreceiver and wal_producer: https://github.com/zenithdb/zenith/blob/main/pageserver/src/walreceiver.rs#L151
The current drop behavior for the sync client will poll_block_on the connection until it has successfully finished.

Am I right to understand that the idea of this issue is to replace this exact code with our PostgresBackend connection and custom WalReceiverHandler implementation of postgres_backend::Handler ?

kelvich · 2021-10-14T14:25:54Z

Hi, what is the current state of this task?

@arssher is on it. But we will anyway keep current logic too, as it allows us to provision new pageservers.

arssher · 2021-10-14T14:59:21Z

Corresponding PR is #628

yeputons · 2022-01-10T15:21:44Z

Here are my final thoughts on this one:

There are multiple technical issues.
- We want to both a) re-use existing PostgreSQL backend/frontend implementations; b) switch backend/frontend roles for an existing connection after a specific command like START_PUSH_REPLICATION.
- There is no need to support all commands after the switch or switching back.
- However, we do need to send/parse backend/frontend specific messages like Standy status update. They're typically wrapped inside CopyData message, which is rather fortunately(?) shared between backend and frontend.
- Existing Safekeeper code handles replication in two separate threads: one processes Hot Standby feedback messages, and the other one actually pushes data. They do interact slightly via (formerly) Timeline::add_hs_feedback or (currently) timeline.update_replica_state. So Safekeeper has to "split" a connection into two halves, this is quite unusual and does not fit nicely into Rust's ownership mode.
[refactor] Safekeeper push replication #628 adds lots of code does the job by:
- Creating a common interface for rust-postgres and our PostgresBackend and using it in the replication part
- Exploiting the fact that CopyData is symmetrical and PostgresBackend does not do lower-level (de)serialization
- Patching up tokio-postgres so it can "split" connection in the CopyBoth mode into reading/writing ends (IIRC it's possible because tokio-postgres spawns a separate coroutine for each client).
There are additional hidden complexities. It's probably not necessary to handle all of them at once, but they have to be handled.
- There is some logic around WAL_RECEIVERS in pageserver which keeps track of existing active "Pageserver --> Safekeeper" connections initiated in response to callmemaybe. It's possible interrupt WAL receiving. However, if we receive WAL from Safekeeper-initiated connection, it's probably handled by PostgresBackend which is fully synchronous and cannot be tokio::select!ed with interrupt_receiver: tokio::sync::oneshot::Receiver.
  - As far as I understand, [refactor] Safekeeper push replication #628 keeps the current logic when callmemaybe is used and does nothing with WAL_RECEIVERS when push replication is used. This may or may not be the wanted behavior.
- Maybe some others.

All in all, feels like a big refactoring to me; a lot of unsymmetrical logic and connection state machines are already in place.

stepashka · 2022-01-19T09:49:24Z

#636 #117 #589 all relate to the similar issues and are on hold for now

kelvich · 2022-06-06T12:58:32Z

Resolved in https://github.com/neondatabase/neon/blob/main/docs/rfcs/015-storage-messaging.md / e5cb727

kelvich added c/storage/safekeeper Component: storage: safekeeper c/storage/pageserver Component: storage: pageserver postgres-zenith labels Jul 8, 2021

kelvich assigned sharnoff Jul 20, 2021

sharnoff mentioned this issue Jul 27, 2021

Use postgres protocol for postgres <-> walkeeper communication #366

Merged

LizardWizzard mentioned this issue Aug 23, 2021

authenticate the remaining storage interactions #451

Closed

hlinnaka mentioned this issue Sep 13, 2021

Reverse Postgres <-> Page server connection when running without safekeepers #589

Closed

stepashka assigned yeputons and unassigned sharnoff Dec 3, 2021

stepashka mentioned this issue Dec 7, 2021

[refactor] Safekeeper push replication #628

Closed

stepashka added a/tech_debt Area: related to tech debt and removed c/compute labels Dec 23, 2021

stepashka mentioned this issue Dec 24, 2021

Async front for proxy #342

Closed

stepashka added the s/had_progress Status: some work was already done in the past label Jan 10, 2022

stepashka unassigned yeputons Jan 10, 2022

stepashka added the p/wal label Jan 14, 2022

stepashka mentioned this issue Jan 19, 2022

Restrict safekeeper -> pageserver connection thread lifetime to the lifetime of connection from walproposer #636

Closed

kelvich closed this as completed Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid callmemaybe #117

Avoid callmemaybe #117

kelvich commented May 10, 2021

sharnoff commented Jul 21, 2021

kelvich commented Jul 22, 2021 •

edited

Loading

kelvich commented Jul 22, 2021

sharnoff commented Jul 23, 2021

lubennikovaav commented Oct 14, 2021

kelvich commented Oct 14, 2021 •

edited

Loading

arssher commented Oct 14, 2021

yeputons commented Jan 10, 2022

stepashka commented Jan 19, 2022

kelvich commented Jun 6, 2022

Avoid callmemaybe #117

Avoid callmemaybe #117

Comments

kelvich commented May 10, 2021

sharnoff commented Jul 21, 2021

End goal

Current implementation

Desired implementation

Reasoning

kelvich commented Jul 22, 2021 • edited Loading

kelvich commented Jul 22, 2021

sharnoff commented Jul 23, 2021

lubennikovaav commented Oct 14, 2021

kelvich commented Oct 14, 2021 • edited Loading

arssher commented Oct 14, 2021

yeputons commented Jan 10, 2022

stepashka commented Jan 19, 2022

kelvich commented Jun 6, 2022

kelvich commented Jul 22, 2021 •

edited

Loading

kelvich commented Oct 14, 2021 •

edited

Loading