-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pg_regress sometimes hangs in CI #413
Comments
Pageserver catches crc mismatch:
pg_waldump on the corresponding wal file:
|
wal hexdump around 0/198E4AD0
|
Jumping over 0/198E4AD0 in pg_waldump does not help -- seems that 0/198E4AD0 is actually last valid record in the file, so it not clear where pageserver's 0/198E6525 is coming from. postgres log around the time with this issue:
|
|
Seems that our FPW / eviction_of_non_wal_logged_pages(43e2ed429) causes incorrect CRC in some cases: output of pg_waldump with disabled CRC error-out:
|
Looking at XLogRegisterBuffer() and XLogRegisterBlock(), it's quite bogus that they clear the |
I tried to flip every bit in bkpb->bkp_image of record with wrong CRC to check if I hit expected CRC. Interestingly enough it actually worked and resulted in right CRC. In one WAL sample that is:
In another:
So in both cases that is a bitflip in the middle of the page. |
With this patch:
I can reproduce this very quickly on my laptop: run |
I was able to get an 'rr' trace of this on my laptop. What happens is that one backend is in the process of writing out a FPI of the page, and between computing the CRC and memcpying the page to the WAL buffer, another backend modifies the page. The modification happens from here, in
|
Long story short, zenith_wallog_page() should not be using log_newpage() when its not holding an exclusive lock on the page. XLogSaveBufferForHint() has to jump through extra hoops and make a temporary copy of the page precisely because of this. |
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
…o CRC errors. zenith_wallog_page() would call log_newpage() on a buffer, while holding merely a shared lock on the page. That's not cool, because another backend could modify the page concurrently. We allow changing hint bits while holding only a shared lock, and changes on FSM pages, at least. See comments in XLogSaveBufferForHint() for discussion of this problem. One instance of the race condition that I was able to capture on my laptop happened like this: 1. Backend A: needs to evict an FSM page from the buffer cache to make room for a new page, and calls zenith_wallog_page() on it. That is done while holding a share lock on the page. 2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record including the FSM page 3. Backend B: Updates the same FSM page while holding only a share lock 4. Backend A: Allocates space in the WAL buffers, and copies the WAL record header and the page to the buffers. At this point, the CRC that backend A computed earlier doesn't match the contents that were written out to the WAL buffers. The update of the FSM page in backend B happened from there (fsmpage.c): /* * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * * Wrap-around is handled at the beginning of this function. */ fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0); To fix, make a temporary copy of the page in zenith_wallog_page(), and WAL-log that. Just like XLogSaveBufferForHint() does. Fixes neondatabase/neon#413
No description provided.
The text was updated successfully, but these errors were encountered: