Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup#5
Open
Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup#5
Conversation
Author
|
This change is part of the following stack: Change managed by git-spice. |
afe882e to
2a6aff6
Compare
2a6aff6 to
ad443c2
Compare
7 tasks
|
Could we switch to upstream with this change? https://github.com/slatedb/slatedb/pull/1473/changes |
8bc2754 to
a251444
Compare
a251444 to
5b91f1b
Compare
…would fail with the same WalStoreReconfigurationError — because the clone's manifest would inherit the parent's wal_object_store_uri - db_state.rs — init_clone_db() clears wal_object_store_uri so the clone doesn't inherit the parent's WAL config - builder.rs (Db + DbReader) — relaxed the WAL URI check to allow None → Some (so a clone can be opened with a new WAL store), and the Db builder persists the new URI to the manifest on first open - admin.rs — same relaxation for create_detached_checkpoint - clone.rs — new test verifying data integrity after cloning a split-WAL DB Adding the ability to read from a parent WAL and write to a new WAL
before cloning, `copy_wal_ssts` would fail with `NotFound` trying to copy WAL SSTs that no longer exist (but whose data is safely in L0). `copy_wal_ssts` now skips `NotFound` WAL SSTs at the start of the copy range (these are WALs already flushed to L0) and returns an adjusted `replay_after_wal_id` so the clone manifest doesn't reference them. Missing WALs after a successfully copied WAL still error, since that indicates actual data loss rather than post-flush cleanup.
f7dd204 to
538af16
Compare
…ed-up WAL files When a DB opens, fence_writers writes an empty WAL SST that advances next_wal_sst_id but not replay_after_wal_id, leaving a gap of 1. After a full flush the memtable is empty, so flush_memtables never freezes or flushes anything, and replay_after_wal_id stays behind. If the WAL store is later cleaned up externally (e.g. local disk wiped after a shard split), a clone created from this checkpoint tries to copy the missing fencing WAL and crashes with NotFound. Fix: after CheckpointScope::All flushes complete, advance replay_after_wal_id to next_wal_sst_id - 1. All data is already in L0 at that point, so the remaining WAL SSTs are redundant.
538af16 to
2ea9e40
Compare
9fd4e5f to
2f5a869
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup
Summary
Problem
When a DB opens, fence_writers writes an empty WAL SST directly to the object store. This advances next_wal_sst_id (via replay_memtable) but not replay_after_wal_id, creating a gap of 1. After a CheckpointScope::All flush, if the memtable is already empty, flush_memtables has nothing to freeze or flush, so replay_after_wal_id never catches up.
This becomes a problem when the WAL store is cleaned up externally after a full flush — for example, a shard split where the parent is flushed to L0, the local WAL disk is wiped, and a clone is created from the checkpoint. copy_wal_ssts tries to copy the fencing WAL that no longer exists and crashes with NotFound.
Why slatedb#1458 was not sufficient
slatedb#1458 added the ability to copy WAL SSTs from a separate parent WAL store during cloning. This correctly handles the transport — reading WALs from one store and writing them to another. But it doesn't address the case where the WAL SSTs are gone entirely. The manifest still references them (the gap between replay_after_wal_id and next_wal_sst_id), so
copy_wal_ssts still tries to copy files that don't exist. The fix needs to be upstream of the copy: the checkpoint itself should not reference WAL SSTs that are redundant after a full flush.
Test plan