Add neon.primary_is_running GUC. #6705

lubennikovaav · 2024-02-09T16:09:22Z

We set it for neon replica, if primary is running

Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183

github-actions · 2024-02-09T16:28:22Z

2478 tests run: 2357 passed, 0 failed, 121 skipped (full report)

Flaky tests (2)

Postgres 14

test_pageserver_restarts_under_worload: release
test_timeline_deletion_with_files_stuck_in_upload_queue: debug

Code coverage* (full report)

functions: 28.8% (6783 of 23548 functions)
lines: 47.7% (41287 of 86631 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
1908692 at 2024-02-23T13:29:22.947Z :recycle:}

hlinnaka · 2024-02-10T10:39:45Z

For the sake of the archives, please explain the problem in the commit message too. And perhaps open a github issue to track it. IIUC the problem is that:

No primary is running on a branch
You start a new read-only replica on the branch
The read-only replica doesn't allow connections, because it doesn't have a valid standby snapshot yet. It doesn't know which transactions that have not committed yet might still be running in the primary and commit later. There is no primary, so the answer is "none", but Postgres doesn't know that.

A hot standby will wait for RUNNING_XACTS or shutdown checkpoint records until it can allow connections. If the primary is not running, they will never come.

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Even with this flag, this can still happen:

There is a primary running
You start a read-only replica
Primary crashes / is killed before it writes a RUNNING_XACTS record.

So just checking that a primary is running when the standby starts isn't necessarily enough.

A GUC doesn't feel like a very good mechanism to deliver this information. Whether a primary is running or not is very ephemeral. Perhaps the neon extension should ask the control plane for that directly with an API call?

libs/compute_api/src/spec.rs

knizhnik · 2024-02-10T19:28:07Z

For the sake of the archives, please explain the problem in the commit message too. And perhaps open a github issue to track it. IIUC the problem is that:

No primary is running on a branch

You start a new read-only replica on the branch

The read-only replica doesn't allow connections, because it doesn't have a valid standby snapshot yet. It doesn't know which transactions that have not committed yet might still be running in the primary and commit later. There is no primary, so the answer is "none", but Postgres doesn't know that.

A hot standby will wait for RUNNING_XACTS or shutdown checkpoint records until it can allow connections. If the primary is not running, they will never come.

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

Even with this flag, this can still happen:

There is a primary running

You start a read-only replica

Primary crashes / is killed before it writes a RUNNING_XACTS record.

Primary has to be restarted in any case.
We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Actually replica has no other choice rather than wait for primary once it starts recovery - receiving WAL from primary.
AS a result it can receive tuples which can contain XIDs of running transactions. This transactions can be committed before primary crash. So replica may not set XMIN_INVSLID hint but for such tuples.

So just checking that a primary is running when the standby starts isn't necessarily enough.

I do not know better solution.

A GUC doesn't feel like a very good mechanism to deliver this information. Whether a primary is running or not is very ephemeral. Perhaps the neon extension should ask the control plane for that directly with an API call?

The code making decision whether to set wasInterrupted to not was not in extension, but in Postgres core (xlogrecovery.c)
Certainly it is possible to add one more hook here. But we have to change Postgres sources in any case.

I do not understand why sending request to control plane is better than sending this information through compute spec and then GUC. Also extra round-trip doesn't speedup startup.

lubennikovaav · 2024-02-12T14:11:29Z

Even with this flag, this can still happen:
There is a primary running
You start a read-only replica
Primary crashes / is killed before it writes a RUNNING_XACTS record.
Primary has to be restarted in any case.

We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Hmm, looks like we can get into a deadlock here. Imagine this situation:

primary is running
replica is starting and waiting for running_xacts from primary
primary is crashing
StartCompute(replica) operation is still in progress, and control-plane cannot run concurrent operation of StartCompute(primary), so it will wait in a queue.
Eventually, this situation will resolve, when replica startCompute deadline exceeds.

Are we OK with this?
I'd say yes, because crashing primary is not normal and should be fixed asap anyway.
@hlinnaka, @knizhnik, WDYT?

hlinnaka · 2024-02-12T18:56:01Z

Even with this flag, this can still happen:
There is a primary running
You start a read-only replica
Primary crashes / is killed before it writes a RUNNING_XACTS record.
Primary has to be restarted in any case.

We do not support replacement of primary with replica.
So Neon should detect crash of primary node and restart it.
Replica will reconnect to new primary and receive RUNNING_XACTS record.

Hmm, looks like we can get into a deadlock here. Imagine this situation:

primary is running

replica is starting and waiting for running_xacts from primary

primary is crashing

StartCompute(replica) operation is still in progress, and control-plane cannot run concurrent operation of StartCompute(primary), so it will wait in a queue.

Eventually, this situation will resolve, when replica startCompute deadline exceeds.

Are we OK with this? I'd say yes, because crashing primary is not normal and should be fixed asap anyway. @hlinnaka, @knizhnik, WDYT?

Until #6712 is merged, we don't actually write the shutdown checkpoint record even on a clean shutdown, so you'll get the above with a clean shutdown too.

hlinnaka · 2024-02-12T19:01:37Z

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

I see. Can you include those changes here, please, so that this can be reviewed, tested and committed as one unit, please?

Needs tests.

hlinnaka · 2024-02-12T19:10:00Z

So we currently have this bug:

Primary is running
Start a transaction in primary. Insert a row.
Start a replica. Because we always set 'wasShutdown==true' in xlogrecovery.c, the replica considers all in-progress transactions as aborted.
On replica. BEGIN REPEATABLE READ; and run a query on replica. The new row inserted in primary is not visible.
Commit the transaction on primary
On replica: run another query in the same transaction. Incorrectly, the new row is now visible.

Please create a python test case for that. This PR should then fix it.

knizhnik · 2024-02-12T19:53:25Z

So we currently have this bug:

Primary is running

Start a transaction in primary. Insert a row.

Start a replica. Because we always set 'wasShutdown==true' in xlogrecovery.c, the replica considers all in-progress transactions as aborted.

On replica. BEGIN REPEATABLE READ; and run a query on replica. The new row inserted in primary is not visible.

Commit the transaction on primary

On replica: run another query in the same transaction. Incorrectly, the new row is now visible.

Please create a python test case for that. This PR should then fix it.

There is such test test_runner/regress/test_replication_start.py in PR #6357:
https://github.com/neondatabase/neon/pull/6357/files#diff-7edc60d2be535ca5bbd3cce1b149594fd1994e755bd3b1f362a7e010a5cd3b16

knizhnik · 2024-02-12T20:05:17Z

This PR adds a GUC to tell the standby that there is no primary running. It doesn't actually do anything with it. Will that be a followup PR?

Hot-standby replica is waiting for primary only if was not started after normal shutdown (wasShutdown true in xlogrecovery.c). This behaviour was changed only by my PR #6357. And I am also going to handle added GUC in this PR.

I see. Can you include those changes here, please, so that this can be reviewed, tested and committed as one unit, please?

Needs tests.

I can shuffle changes between PR as you prefer.
But actually this problem and its fix has no relation to this PR.
So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.
The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.
It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).

So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master.
Please notice that

To make this POR work we need first merge it's control plane part
Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

hlinnaka · 2024-02-12T20:24:17Z

I can shuffle changes between PR as you prefer.

Thank you. The general rule should be: one bug, one PR. PR includes all code changes required to fix the bug, and a test to reproduce it, but nothing else.

But actually this problem and its fix has no relation to this PR. So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.

The point of this PR is to fix that bug, right?

The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.

According to the description of #6357, it is about propagating apply_lsn from safekeepers to pageservers, to postpone GC. That seems completely unrelated to this.

It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).

So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master. Please notice that

To make this POR work we need first merge it's control plane part

Gotcha.

Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

Yes, it does. We need regression tests for these things. If we go with a GUC, it's easy to set the GUC in a test to simulate how the control plane would set it.

To summarize, for this PR:

Let's agree that the point of this PR is to fix the bug that a replica constructs incorrect snapshots, which leads to incorrect query results, if the primary is running.
Please include all the code changes needed to fix that bug, but nothing else.
Please include a regression test case that demonstrates the bug. Without the code changes from PR, the test fails, and with the changes, it passes.

knizhnik · 2024-02-13T18:37:13Z

I can shuffle changes between PR as you prefer.

Thank you. The general rule should be: one bug, one PR. PR includes all code changes required to fix the bug, and a test to reproduce it, but nothing else.

But actually this problem and its fix has no relation to this PR. So, what we have no: replica is started with wasShutdown=true doesn't receive information about running xacts, construct incorrect snapshot and as a result see incorrect data.

The point of this PR is to fix that bug, right?

The problem was fixed in #6357 - now hot-standby replica waits for WAL from primary.

According to the description of #6357, it is about propagating apply_lsn from safekeepers to pageservers, to postpone GC. That seems completely unrelated to this.

It cause problem with e2e tests (which spawns read-only replica on branch without spawning master). But certainly it is not only related to e2e. So Star has proposed this mechanism with informing replica by control place whether primary is running (wasShutdown=false) or not (wasShutdown=true).
So #6357 solves the problem with spawning replica with alive master. And this PR (in conjunction with correspondent PR for control plane) solves the problem with spawning replica without alive master. Please notice that

To make this POR work we need first merge it's control plane part

Gotcha.

Regression tests are not using control plane. So to test it we do not need this PR. It is possible to propagate this flag through neon_local. But I am not sure that it makes sense.

Yes, it does. We need regression tests for these things. If we go with a GUC, it's easy to set the GUC in a test to simulate how the control plane would set it.

To summarize, for this PR:

Let's agree that the point of this PR is to fix the bug that a replica constructs incorrect snapshots, which leads to incorrect query results, if the primary is running.

Please include all the code changes needed to fix that bug, but nothing else.

Please include a regression test case that demonstrates the bug. Without the code changes from PR, the test fails, and with the changes, it passes.

Done

knizhnik · 2024-02-13T18:40:46Z

e2e tests will not pass before https://github.com/neondatabase/cloud/pull/10183 is merged

compute_tools/src/config.rs

control_plane/src/endpoint.rs

libs/postgres_ffi/src/xlog_utils.rs

pgxn/neon/neon.c

test_runner/regress/test_replication_start.py

…mary is not alive (#364) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

…mary is not alive (#363) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

…mary is not alive (#365) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]> Co-authored-by: Heikki Linnakangas <[email protected]>

We set it for neon replica, if primary is running

Co-authored-by: Heikki Linnakangas <[email protected]>

skyzh · 2024-03-22T13:55:53Z

ref #7204

…mary is not alive (#365) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]> Co-authored-by: Heikki Linnakangas <[email protected]>

…mary is not alive (#364) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

…mary is not alive (#363) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

…mary is not alive (#364) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

…mary is not alive (#365) * Set wasShutdown=true during hot-standby replica startup only when primary is not alive * Report fatal error if hot standaby replica is started with oldestAcriveXid=0 Postgres part of neondatabase/neon#6705 --------- Co-authored-by: Konstantin Knizhnik <[email protected]> Co-authored-by: Heikki Linnakangas <[email protected]>

…pped Conflicts around - #6705 - #7099

lubennikovaav requested review from a team as code owners February 9, 2024 16:09

lubennikovaav requested review from knizhnik and NanoBjorn and removed request for a team February 9, 2024 16:09

lubennikovaav requested a review from a team as a code owner February 9, 2024 16:38

lubennikovaav requested review from arpad-m and removed request for a team February 9, 2024 16:38

knizhnik approved these changes Feb 9, 2024

View reviewed changes

arpad-m approved these changes Feb 9, 2024

View reviewed changes

hlinnaka reviewed Feb 10, 2024

View reviewed changes

libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved

hlinnaka reviewed Feb 13, 2024

View reviewed changes

compute_tools/src/config.rs Outdated Show resolved Hide resolved

control_plane/src/endpoint.rs Outdated Show resolved Hide resolved

libs/postgres_ffi/src/xlog_utils.rs Show resolved Hide resolved

pgxn/neon/neon.c Outdated Show resolved Hide resolved

hlinnaka reviewed Feb 13, 2024

View reviewed changes

test_runner/regress/test_replication_start.py Show resolved Hide resolved

knizhnik mentioned this pull request Feb 14, 2024

Do not perform fast exit for catalog pages in redo filter #6730

Merged

knizhnik force-pushed the compute_primary_is_running branch from 140a02a to 5bc9c53 Compare February 22, 2024 20:03

lubennikovaav and others added 11 commits February 23, 2024 12:26

Add neon.primary_is_running GUC.

c2e8af0

We set it for neon replica, if primary is running

Bump Postgres versions

3fa993d

Add test hot-standby replica startup

58fb28f

Fix formatting

7f3c993

Set neon.primary_is_running GUC only when it is defined in comnpute spec

6001bef

Update pgxn/neon/neon.c

8c26902

Co-authored-by: Heikki Linnakangas <[email protected]>

Maintain oldestActiveXid in walingest

ad014ee

Remove unneeded comparision of oldestAcrtiveXid

e39928c

Bump Postgres version

313e49f

Extract oldest_running_xid from XlRunningXits WAL records

c6ce602

Bump vendor/postgres

5c76890

knizhnik force-pushed the compute_primary_is_running branch from 795d162 to 05aa7f3 Compare February 23, 2024 12:28

fix rebase typo

1908692

lubennikovaav force-pushed the compute_primary_is_running branch from 05aa7f3 to 1908692 Compare February 23, 2024 12:45

lubennikovaav merged commit a12e426 into main Feb 23, 2024
50 checks passed

lubennikovaav deleted the compute_primary_is_running branch February 23, 2024 13:56

skyzh mentioned this pull request Feb 27, 2024

compute_ctl: PG: ... WARNING: invalid configuration parameter name "neon.primary_is_running", removing it #6908

Closed

skyzh mentioned this pull request Mar 25, 2024

deal with running_xacts to hot standby replica #7236

Closed

problame added a commit that referenced this pull request Jun 19, 2024

Merge remote-tracking branch 'origin/main' into problame/pr-6002-stri…

40a8ed2

…pped Conflicts around - #6705 - #7099

problame mentioned this pull request Jun 19, 2024

walingest: log a one-time warning on unknown record #6276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add neon.primary_is_running GUC. #6705

Add neon.primary_is_running GUC. #6705

lubennikovaav commented Feb 9, 2024

github-actions bot commented Feb 9, 2024 •

edited

Loading

Postgres 14

hlinnaka commented Feb 10, 2024

knizhnik commented Feb 10, 2024

lubennikovaav commented Feb 12, 2024

hlinnaka commented Feb 12, 2024 •

edited

Loading

hlinnaka commented Feb 12, 2024

hlinnaka commented Feb 12, 2024

knizhnik commented Feb 12, 2024

knizhnik commented Feb 12, 2024

hlinnaka commented Feb 12, 2024 •

edited

Loading

knizhnik commented Feb 13, 2024

knizhnik commented Feb 13, 2024

skyzh commented Mar 22, 2024

Add neon.primary_is_running GUC. #6705

Add neon.primary_is_running GUC. #6705

Conversation

lubennikovaav commented Feb 9, 2024

github-actions bot commented Feb 9, 2024 • edited Loading

2478 tests run: 2357 passed, 0 failed, 121 skipped (full report)

Postgres 14

Code coverage* (full report)

hlinnaka commented Feb 10, 2024

knizhnik commented Feb 10, 2024

lubennikovaav commented Feb 12, 2024

hlinnaka commented Feb 12, 2024 • edited Loading

hlinnaka commented Feb 12, 2024

hlinnaka commented Feb 12, 2024

knizhnik commented Feb 12, 2024

knizhnik commented Feb 12, 2024

hlinnaka commented Feb 12, 2024 • edited Loading

knizhnik commented Feb 13, 2024

knizhnik commented Feb 13, 2024

skyzh commented Mar 22, 2024

github-actions bot commented Feb 9, 2024 •

edited

Loading

hlinnaka commented Feb 12, 2024 •

edited

Loading

hlinnaka commented Feb 12, 2024 •

edited

Loading