Kill local cluster runner child processes on abnormal test exit#4597
Open
tillrohrmann wants to merge 1 commit intorestatedev:mainfrom
Open
Kill local cluster runner child processes on abnormal test exit#4597tillrohrmann wants to merge 1 commit intorestatedev:mainfrom
tillrohrmann wants to merge 1 commit intorestatedev:mainfrom
Conversation
The #[restate_core::test] macro installs a panic hook that calls std::process::exit(1), which bypasses Rust Drop impls. This left spawned Restate server processes running as orphans when tests failed. Add an atexit handler that tracks child PIDs in a global registry and sends SIGKILL on process exit. Since std::process::exit() runs C atexit handlers, this ensures cleanup even when Drop impls are skipped. Set LOCAL_CLUSTER_RUNNER_RETAIN_CLUSTER=true to opt out and keep the cluster running for investigation (similar to the existing LOCAL_CLUSTER_RUNNER_RETAIN_TEMPDIR pattern). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6236eec to
0136fd9
Compare
muhamadazmy
approved these changes
Apr 20, 2026
Contributor
muhamadazmy
left a comment
There was a problem hiding this comment.
LGTM! The only question that comes to mind is why we start all child processes in different groups and not just one group. This is normally supported by libc and I think is possible in rust via the https://doc.rust-lang.org/std/os/unix/process/trait.CommandExt.html#tymethod.pre_exec . This means that we don't need to track all group ids of the child processes but just one.
Anyway, the changes looks good to me and is good to merge as is.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The #[restate_core::test] macro installs a panic hook that calls std::process::exit(1), which bypasses Rust Drop impls. This left spawned Restate server processes running as orphans when tests failed.
Add an atexit handler that tracks child PGIDs in a global registry and sends SIGKILL on process exit. Since std::process::exit() runs C atexit handlers, this ensures cleanup even when Drop impls are skipped.
Set LOCAL_CLUSTER_RUNNER_RETAIN_CLUSTER=true to opt out and keep the cluster running for investigation (similar to the existing LOCAL_CLUSTER_RUNNER_RETAIN_TEMPDIR pattern).