Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neon.primary_is_running GUC. #6705

Merged
merged 12 commits into from
Feb 23, 2024
Merged

Conversation

lubennikovaav
Copy link
Contributor

We set it for neon replica, if primary is running

Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183

@lubennikovaav lubennikovaav requested review from a team as code owners February 9, 2024 16:09
@lubennikovaav lubennikovaav requested review from knizhnik and NanoBjorn and removed request for a team February 9, 2024 16:09
Copy link

github-actions bot commented Feb 9, 2024

2478 tests run: 2357 passed, 0 failed, 121 skipped (full report)


Flaky tests (2)

Postgres 14

  • test_pageserver_restarts_under_worload: release
  • test_timeline_deletion_with_files_stuck_in_upload_queue: debug

Code coverage* (full report)

  • functions: 28.8% (6783 of 23548 functions)
  • lines: 47.7% (41287 of 86631 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
1908692 at 2024-02-23T13:29:22.947Z :recycle:

@lubennikovaav lubennikovaav requested a review from a team as a code owner February 9, 2024 16:38
@lubennikovaav lubennikovaav requested review from arpad-m and removed request for a team February 9, 2024 16:38
@hlinnaka
Copy link
Contributor

For the sake of the archives, please explain the problem in the commit message too. And perhaps open a github issue to track it. IIUC the problem is that:

  • No primary is running on a branch
  • You start a new read-only replica on the branch
  • The read-only replica doesn't allow connections, because it doesn't have a valid standby snapshot yet. It doesn't know which transactions that have not committed yet might still be running in the primary and commit later. There is no primary, so the answer is "none", but Postgres doesn't know that.

A hot standby will wait for RUNNING_XACTS or shutdown checkpoint records until it can allow connections. If the primary is not running, they will never come.

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Even with this flag, this can still happen:

  • There is a primary running
  • You start a read-only replica
  • Primary crashes / is killed before it writes a RUNNING_XACTS record.

So just checking that a primary is running when the standby starts isn't necessarily enough.

A GUC doesn't feel like a very good mechanism to deliver this information. Whether a primary is running or not is very ephemeral. Perhaps the neon extension should ask the control plane for that directly with an API call?

@knizhnik
Copy link
Contributor

For the sake of the archives, please explain the problem in the commit message too. And perhaps open a github issue to track it. IIUC the problem is that:

  • No primary is running on a branch
  • You start a new read-only replica on the branch
  • The read-only replica doesn't allow connections, because it doesn't have a valid standby snapshot yet. It doesn't know which transactions that have not committed yet might still be running in the primary and commit later. There is no primary, so the answer is "none", but Postgres doesn't know that.

A hot standby will wait for RUNNING_XACTS or shutdown checkpoint records until it can allow connections. If the primary is not running, they will never come.

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

Even with this flag, this can still happen:

  • There is a primary running
  • You start a read-only replica
  • Primary crashes / is killed before it writes a RUNNING_XACTS record.

Primary has to be restarted in any case.
We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Actually replica has no other choice rather than wait for primary once it starts recovery - receiving WAL from primary.
AS a result it can receive tuples which can contain XIDs of running transactions. This transactions can be committed before primary crash. So replica may not set XMIN_INVSLID hint but for such tuples.

So just checking that a primary is running when the standby starts isn't necessarily enough.

I do not know better solution.

A GUC doesn't feel like a very good mechanism to deliver this information. Whether a primary is running or not is very ephemeral. Perhaps the neon extension should ask the control plane for that directly with an API call?

The code making decision whether to set wasInterrupted to not was not in extension, but in Postgres core (xlogrecovery.c)
Certainly it is possible to add one more hook here. But we have to change Postgres sources in any case.

I do not understand why sending request to control plane is better than sending this information through compute spec and then GUC. Also extra round-trip doesn't speedup startup.

@lubennikovaav
Copy link
Contributor Author

Even with this flag, this can still happen:
There is a primary running
You start a read-only replica
Primary crashes / is killed before it writes a RUNNING_XACTS record.
Primary has to be restarted in any case.

We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Hmm, looks like we can get into a deadlock here. Imagine this situation:

  • primary is running
  • replica is starting and waiting for running_xacts from primary
  • primary is crashing
  • StartCompute(replica) operation is still in progress, and control-plane cannot run concurrent operation of StartCompute(primary), so it will wait in a queue.
  • Eventually, this situation will resolve, when replica startCompute deadline exceeds.

Are we OK with this?
I'd say yes, because crashing primary is not normal and should be fixed asap anyway.
@hlinnaka, @knizhnik, WDYT?

@hlinnaka
Copy link
Contributor

hlinnaka commented Feb 12, 2024

Even with this flag, this can still happen:
There is a primary running
You start a read-only replica
Primary crashes / is killed before it writes a RUNNING_XACTS record.
Primary has to be restarted in any case.

We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Hmm, looks like we can get into a deadlock here. Imagine this situation:

  • primary is running

  • replica is starting and waiting for running_xacts from primary

  • primary is crashing

  • StartCompute(replica) operation is still in progress, and control-plane cannot run concurrent operation of StartCompute(primary), so it will wait in a queue.

  • Eventually, this situation will resolve, when replica startCompute deadline exceeds.

Are we OK with this? I'd say yes, because crashing primary is not normal and should be fixed asap anyway. @hlinnaka, @knizhnik, WDYT?

Until #6712 is merged, we don't actually write the shutdown checkpoint record even on a clean shutdown, so you'll get the above with a clean shutdown too.

@hlinnaka
Copy link
Contributor

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

I see. Can you include those changes here, please, so that this can be reviewed, tested and committed as one unit, please?

Needs tests.

@hlinnaka
Copy link
Contributor

So we currently have this bug:

  1. Primary is running
  2. Start a transaction in primary. Insert a row.
  3. Start a replica. Because we always set 'wasShutdown==true' in xlogrecovery.c, the replica considers all in-progress transactions as aborted.
  4. On replica. BEGIN REPEATABLE READ; and run a query on replica. The new row inserted in primary is not visible.
  5. Commit the transaction on primary
  6. On replica: run another query in the same transaction. Incorrectly, the new row is now visible.

Please create a python test case for that. This PR should then fix it.

@knizhnik
Copy link
Contributor

So we currently have this bug:

  1. Primary is running
  2. Start a transaction in primary. Insert a row.
  3. Start a replica. Because we always set 'wasShutdown==true' in xlogrecovery.c, the replica considers all in-progress transactions as aborted.
  4. On replica. BEGIN REPEATABLE READ; and run a query on replica. The new row inserted in primary is not visible.
  5. Commit the transaction on primary
  6. On replica: run another query in the same transaction. Incorrectly, the new row is now visible.

Please create a python test case for that. This PR should then fix it.

There is such test test_runner/regress/test_replication_start.py in PR #6357:
https://github.com/neondatabase/neon/pull/6357/files#diff-7edc60d2be535ca5bbd3cce1b149594fd1994e755bd3b1f362a7e010a5cd3b16

@knizhnik
Copy link
Contributor

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

I see. Can you include those changes here, please, so that this can be reviewed, tested and committed as one unit, please?

Needs tests.

I can shuffle changes between PR as you prefer.
But actually this problem and its fix has no relation to this PR.
So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.
The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.
It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).

So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master.
Please notice that

  1. To make this POR work we need first merge it's control plane part
  2. Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

@hlinnaka
Copy link
Contributor

hlinnaka commented Feb 12, 2024

I can shuffle changes between PR as you prefer.

Thank you. The general rule should be: one bug, one PR. PR includes all code changes required to fix the bug, and a test to reproduce it, but nothing else.

But actually this problem and its fix has no relation to this PR. So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.

The point of this PR is to fix that bug, right?

The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.

According to the description of #6357, it is about propagating apply_lsn from safekeepers to pageservers, to postpone GC. That seems completely unrelated to this.

It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).

So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master. Please notice that

  1. To make this POR work we need first merge it's control plane part

Gotcha.

  1. Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

Yes, it does. We need regression tests for these things. If we go with a GUC, it's easy to set the GUC in a test to simulate how the control plane would set it.

To summarize, for this PR:

  • Let's agree that the point of this PR is to fix the bug that a replica constructs incorrect snapshots, which leads to incorrect query results, if the primary is running.
  • Please include all the code changes needed to fix that bug, but nothing else.
  • Please include a regression test case that demonstrates the bug. Without the code changes from PR, the test fails, and with the changes, it passes.

@knizhnik
Copy link
Contributor

I can shuffle changes between PR as you prefer.

Thank you. The general rule should be: one bug, one PR. PR includes all code changes required to fix the bug, and a test to reproduce it, but nothing else.

But actually this problem and its fix has no relation to this PR. So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.

The point of this PR is to fix that bug, right?

The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.

According to the description of #6357, it is about propagating apply_lsn from safekeepers to pageservers, to postpone GC. That seems completely unrelated to this.

It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).
So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master. Please notice that

  1. To make this POR work we need first merge it's control plane part

Gotcha.

  1. Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

Yes, it does. We need regression tests for these things. If we go with a GUC, it's easy to set the GUC in a test to simulate how the control plane would set it.

To summarize, for this PR:

  • Let's agree that the point of this PR is to fix the bug that a replica constructs incorrect snapshots, which leads to incorrect query results, if the primary is running.
  • Please include all the code changes needed to fix that bug, but nothing else.
  • Please include a regression test case that demonstrates the bug. Without the code changes from PR, the test fails, and with the changes, it passes.

Done

@knizhnik
Copy link
Contributor

e2e tests will not pass before https://github.com/neondatabase/cloud/pull/10183 is merged

lubennikovaav pushed a commit to neondatabase/postgres that referenced this pull request Feb 22, 2024
…mary is not alive (#364)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
lubennikovaav pushed a commit to neondatabase/postgres that referenced this pull request Feb 22, 2024
…mary is not alive (#363)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
lubennikovaav pushed a commit to neondatabase/postgres that referenced this pull request Feb 22, 2024
…mary is not alive (#365)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
Co-authored-by: Heikki Linnakangas <[email protected]>
@knizhnik knizhnik force-pushed the compute_primary_is_running branch from 140a02a to 5bc9c53 Compare February 22, 2024 20:03
@knizhnik knizhnik force-pushed the compute_primary_is_running branch from 795d162 to 05aa7f3 Compare February 23, 2024 12:28
@lubennikovaav lubennikovaav force-pushed the compute_primary_is_running branch from 05aa7f3 to 1908692 Compare February 23, 2024 12:45
@lubennikovaav lubennikovaav merged commit a12e426 into main Feb 23, 2024
50 checks passed
@lubennikovaav lubennikovaav deleted the compute_primary_is_running branch February 23, 2024 13:56
@skyzh
Copy link
Member

skyzh commented Mar 22, 2024

ref #7204

tristan957 pushed a commit to neondatabase/postgres that referenced this pull request May 10, 2024
…mary is not alive (#365)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
Co-authored-by: Heikki Linnakangas <[email protected]>
tristan957 pushed a commit to neondatabase/postgres that referenced this pull request May 10, 2024
…mary is not alive (#364)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
tristan957 pushed a commit to neondatabase/postgres that referenced this pull request May 10, 2024
…mary is not alive (#363)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
tristan957 pushed a commit to neondatabase/postgres that referenced this pull request May 20, 2024
…mary is not alive (#364)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
tristan957 pushed a commit to neondatabase/postgres that referenced this pull request May 20, 2024
…mary is not alive (#365)

* Set wasShutdown=true during hot-standby replica startup only when primary is not alive
* Report fatal error if hot standaby replica is started with oldestAcriveXid=0

Postgres part of neondatabase/neon#6705
---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
Co-authored-by: Heikki Linnakangas <[email protected]>
problame added a commit that referenced this pull request Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants