Skip to content

Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup#5

Open
udnay wants to merge 4 commits intogadgetfrom
yo/relax-wal-uri-check
Open

Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup#5
udnay wants to merge 4 commits intogadgetfrom
yo/relax-wal-uri-check

Conversation

@udnay
Copy link
Copy Markdown

@udnay udnay commented Apr 1, 2026

Title: Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup

Summary

  • After CheckpointScope::All flushes, advance replay_after_wal_id to next_wal_sst_id - 1 so the checkpoint doesn't reference WAL SSTs that may be cleaned up externally
  • Restore the explicit NotFound path assertion in clone_should_fail_when_wal_store_is_not_provided that was lost during Allow clone to copy WAL SSTs from a separate WAL store slatedb/slatedb#1458
  • Add tests for WAL gap closure, WAL replay on clones, and cloning after external WAL cleanup

Problem

When a DB opens, fence_writers writes an empty WAL SST directly to the object store. This advances next_wal_sst_id (via replay_memtable) but not replay_after_wal_id, creating a gap of 1. After a CheckpointScope::All flush, if the memtable is already empty, flush_memtables has nothing to freeze or flush, so replay_after_wal_id never catches up.

This becomes a problem when the WAL store is cleaned up externally after a full flush — for example, a shard split where the parent is flushed to L0, the local WAL disk is wiped, and a clone is created from the checkpoint. copy_wal_ssts tries to copy the fencing WAL that no longer exists and crashes with NotFound.

Why slatedb#1458 was not sufficient

slatedb#1458 added the ability to copy WAL SSTs from a separate parent WAL store during cloning. This correctly handles the transport — reading WALs from one store and writing them to another. But it doesn't address the case where the WAL SSTs are gone entirely. The manifest still references them (the gap between replay_after_wal_id and next_wal_sst_id), so
copy_wal_ssts still tries to copy files that don't exist. The fix needs to be upstream of the copy: the checkpoint itself should not reference WAL SSTs that are redundant after a full flush.

Test plan

  • test_checkpoint_scope_all_closes_wal_gap — verifies replay_after_wal_id == next_wal_sst_id - 1 after full flush
  • test_checkpoint_wal_replay_needed_without_full_flush — verifies Durable scope preserves the WAL gap
  • clone_replays_wal_and_continues_writing — clone replays parent WALs, writes new data, flushes, close/reopen, all data survives
  • clone_succeeds_after_parent_wal_cleanup — full flush, delete WAL files, clone, verify data
  • clone_should_fail_when_wal_store_is_not_provided — wrong WAL store still errors with specific NotFound path (restored from main)

@udnay
Copy link
Copy Markdown
Author

udnay commented Apr 1, 2026

This change is part of the following stack:

Change managed by git-spice.

@kirinrastogi
Copy link
Copy Markdown

Could we switch to upstream with this change? https://github.com/slatedb/slatedb/pull/1473/changes

@udnay udnay force-pushed the yo/relax-wal-uri-check branch from 8bc2754 to a251444 Compare April 7, 2026 01:34
@udnay udnay force-pushed the yo/relax-wal-uri-check branch from a251444 to 5b91f1b Compare April 7, 2026 13:09
udnay added 2 commits April 7, 2026 09:44
…would fail with the same WalStoreReconfigurationError — because the clone's manifest would inherit the parent's wal_object_store_uri

- db_state.rs — init_clone_db() clears wal_object_store_uri so the clone doesn't inherit the parent's WAL config
- builder.rs (Db + DbReader) — relaxed the WAL URI check to allow None → Some (so a clone can be opened with a new WAL store), and the Db builder persists the new URI to the manifest on first open
- admin.rs — same relaxation for create_detached_checkpoint
- clone.rs — new test verifying data integrity after cloning a split-WAL DB

Adding the ability to read from a parent WAL and write to a new WAL
before cloning, `copy_wal_ssts` would fail with `NotFound` trying to copy
WAL SSTs that no longer exist (but whose data is safely in L0).

`copy_wal_ssts` now skips `NotFound` WAL SSTs at the start of the copy
range (these are WALs already flushed to L0) and returns an adjusted
`replay_after_wal_id` so the clone manifest doesn't reference them.
Missing WALs after a successfully copied WAL still error, since that
indicates actual data loss rather than post-flush cleanup.
@udnay udnay force-pushed the yo/relax-wal-uri-check branch 2 times, most recently from f7dd204 to 538af16 Compare April 7, 2026 13:52
…ed-up WAL files

When a DB opens, fence_writers writes an empty WAL SST that advances
next_wal_sst_id but not replay_after_wal_id, leaving a gap of 1. After
a full flush the memtable is empty, so flush_memtables never freezes or
flushes anything, and replay_after_wal_id stays behind. If the WAL store
is later cleaned up externally (e.g. local disk wiped after a shard
split), a clone created from this checkpoint tries to copy the missing
fencing WAL and crashes with NotFound.

Fix: after CheckpointScope::All flushes complete, advance
replay_after_wal_id to next_wal_sst_id - 1. All data is already in L0
at that point, so the remaining WAL SSTs are redundant.
@udnay udnay force-pushed the yo/relax-wal-uri-check branch from 538af16 to 2ea9e40 Compare April 7, 2026 14:20
@udnay udnay force-pushed the yo/relax-wal-uri-check branch from 9fd4e5f to 2f5a869 Compare April 7, 2026 17:22
@udnay udnay changed the title Opening a clone of a split-WAL parent without specifying a WAL store would fail Close WAL gap in CheckpointScope::All to prevent clone failures after WAL cleanup Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants