Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid callmemaybe #117

Closed
kelvich opened this issue May 10, 2021 · 10 comments
Closed

Avoid callmemaybe #117

kelvich opened this issue May 10, 2021 · 10 comments
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver c/storage/safekeeper Component: storage: safekeeper s/had_progress Status: some work was already done in the past

Comments

@kelvich
Copy link
Contributor

kelvich commented May 10, 2021

Right now postgres and safekeeper need to call pageserver command "callmemaybe" to ask for incoming WAL streaming connection. Also there is a postgres <-> safekeeper connection that can be initiated without any callbacks. We can switch to that protocol both in postgres <-> pageserver direct connection and in safekeeper <-> pageserver connection.

@kelvich kelvich added c/storage/safekeeper Component: storage: safekeeper c/storage/pageserver Component: storage: pageserver postgres-zenith labels Jul 8, 2021
@sharnoff
Copy link
Member

Notes from conversations with @kelvich:

End goal

The full goal of this issue comes in two parts. The first is to replace the existing communication between postgres and safekeeper so that it's built entirely on libpq, using a new libpq command for safekeeper that postgres calls: START_WAL_PUSH. The second piece is to replace the existing "callmemaybe" calls between the pageserver and safekeeper/postgres with a similar or near-identical solution, with the same sort of command.

Current implementation

(Note: I may have some pieces of this somewhat wrong -- please let me know!)

The current communications between safekeeper and postgres are done outside of the postgres protocol, directly over TCP with a custom protocol. The safekeeper initiates the communication as a request for postgres to forward the WAL to it, but safekeeper must ask.

There's a similar mechanism at play in safekeeper <-> pageserver and postgres <-> pageserver connections -- I don't yet know enough about that to precisely say what distinguishes it from the postgres <-> safekeeper connection though.

The protocol in use is essentially just sending along the WAL, but there's a few extra pieces of information that it gets tagged with for handling elections, so we can't just directly use the relevant functionality from the postgres protocol.

Desired implementation

What we want to do is to switch over the existing custom protocol (built on top of TCP), and fit that to work over CopyData messages within the postgres protocol. The other major change is to remove the current implementation's organization around callbacks, so that postgres proactively sends the WAL to whichever system is expecting it, and safekeeper proactively does the same to send to the pageserver.

Reasoning

Briefly: "callmemaybe" carries unnecessary complexity -- especially because it's currently implemented outside of the postgres protocol. Implementing the functionality as "pushing" the WAL instead of "pulling" would be simpler, and building it on top of the postgres protocol allows us to take full advantage of the systems already provided there (like authentication).

@kelvich
Copy link
Contributor Author

kelvich commented Jul 22, 2021

Thanks for writing this up.

small fix:

The safekeeper initiates the communication as a request for postgres to forward the WAL to it, but safekeeper must ask.

(from 'Current implementation')

Postgres initiates communication with safekeeper. That is opposed to how that happens with pg<->pageserver and safekeeper<->pageserver where initiating party should first connect and run 'callmemaybe' and after that receiving party would open connection to the initiator and start pulling WAL.

@kelvich
Copy link
Contributor Author

kelvich commented Jul 22, 2021

May be this can help

Callmemaybe

Step 1: change WAL push from custom tcp proto to libpq
Step 2: change callmemaybe which is libpq to WAL push and data directly in current connection without any dial back connection

@sharnoff
Copy link
Member

I spent some time earlier trying to understand how walproposer fits into the postgres codebase in order to know what actually needs to change -- the end result is the picture of a flowchart I drew up below.

For the most part, arrows correspond to function calls. Somtimes, little additional notes are included - like indicating that WalProposerMain just calls WalProposerPoll in an infinite loop. The two main entrypoints that I've focused on are double-boxed (PostmasterMain and PostgresMain), and all public exports from walproposer.c are underlined.

There's more stuff within walproposer that I didn't cover here, but I found the overall system view to be helpful enough by itself to polish up a little and add to the issue.

postgres-walproposer-flowchart

@lubennikovaav
Copy link
Contributor

Hi, what is the current state of this task?
While working on pageserver shutdown, I bumped into the problem with the hanging connection between walreceiver and wal_producer: https://github.com/zenithdb/zenith/blob/main/pageserver/src/walreceiver.rs#L151
The current drop behavior for the sync client will poll_block_on the connection until it has successfully finished.

Am I right to understand that the idea of this issue is to replace this exact code with our PostgresBackend connection and custom WalReceiverHandler implementation of postgres_backend::Handler ?

@kelvich
Copy link
Contributor Author

kelvich commented Oct 14, 2021

Hi, what is the current state of this task?

@arssher is on it. But we will anyway keep current logic too, as it allows us to provision new pageservers.

@arssher
Copy link
Contributor

arssher commented Oct 14, 2021

Corresponding PR is #628

@stepashka stepashka assigned yeputons and unassigned sharnoff Dec 3, 2021
@stepashka stepashka added a/tech_debt Area: related to tech debt and removed c/compute labels Dec 23, 2021
@stepashka stepashka added the s/had_progress Status: some work was already done in the past label Jan 10, 2022
@yeputons
Copy link
Contributor

Here are my final thoughts on this one:

  • There are multiple technical issues.
    • We want to both a) re-use existing PostgreSQL backend/frontend implementations; b) switch backend/frontend roles for an existing connection after a specific command like START_PUSH_REPLICATION.
    • There is no need to support all commands after the switch or switching back.
    • However, we do need to send/parse backend/frontend specific messages like Standy status update. They're typically wrapped inside CopyData message, which is rather fortunately(?) shared between backend and frontend.
    • Existing Safekeeper code handles replication in two separate threads: one processes Hot Standby feedback messages, and the other one actually pushes data. They do interact slightly via (formerly) Timeline::add_hs_feedback or (currently) timeline.update_replica_state. So Safekeeper has to "split" a connection into two halves, this is quite unusual and does not fit nicely into Rust's ownership mode.
  • [refactor] Safekeeper push replication #628 adds lots of code does the job by:
    • Creating a common interface for rust-postgres and our PostgresBackend and using it in the replication part
    • Exploiting the fact that CopyData is symmetrical and PostgresBackend does not do lower-level (de)serialization
    • Patching up tokio-postgres so it can "split" connection in the CopyBoth mode into reading/writing ends (IIRC it's possible because tokio-postgres spawns a separate coroutine for each client).
  • There are additional hidden complexities. It's probably not necessary to handle all of them at once, but they have to be handled.
    • There is some logic around WAL_RECEIVERS in pageserver which keeps track of existing active "Pageserver --> Safekeeper" connections initiated in response to callmemaybe. It's possible interrupt WAL receiving. However, if we receive WAL from Safekeeper-initiated connection, it's probably handled by PostgresBackend which is fully synchronous and cannot be tokio::select!ed with interrupt_receiver: tokio::sync::oneshot::Receiver.
      • As far as I understand, [refactor] Safekeeper push replication #628 keeps the current logic when callmemaybe is used and does nothing with WAL_RECEIVERS when push replication is used. This may or may not be the wanted behavior.
    • Maybe some others.

All in all, feels like a big refactoring to me; a lot of unsymmetrical logic and connection state machines are already in place.

@stepashka
Copy link
Member

#636 #117 #589 all relate to the similar issues and are on hold for now

@kelvich
Copy link
Contributor Author

kelvich commented Jun 6, 2022

@kelvich kelvich closed this as completed Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver c/storage/safekeeper Component: storage: safekeeper s/had_progress Status: some work was already done in the past
Projects
None yet
Development

No branches or pull requests

6 participants