Skip to content

Centralize workspace runtime lifecycle#228

Draft
mariusvniekerk wants to merge 4 commits intomainfrom
architecture-workspace-runtime
Draft

Centralize workspace runtime lifecycle#228
mariusvniekerk wants to merge 4 commits intomainfrom
architecture-workspace-runtime

Conversation

@mariusvniekerk
Copy link
Copy Markdown
Collaborator

Summary

  • add a runtime lifecycle coordinator for workspace launch, stop, cleanup, and delete flows
  • move process-plus-persistence choreography out of HTTP handlers
  • document the workspace runtime ownership boundary

How this makes the code better

  • gives handlers one uniform API for runtime transitions instead of repeated tmux/session cleanup branches
  • keeps durable tmux ownership cleanup in the same place as process stop behavior
  • makes future workspace runtime changes smaller because lifecycle rules live in one module

Runtime launch, tmux ownership recording, missing-session cleanup, stop fallback, and delete-time stopping were split across server handlers and workspace internals. This creates a small lifecycle coordinator so callers ask for runtime transitions instead of reimplementing process-plus-persistence choreography.

The result gives launch, stop, missing-session reconciliation, and workspace delete the same durable cleanup rules from one module, reducing handler branching and keeping future workspace runtime changes behind a single API.

Validation: go test ./internal/workspace -run TestRuntimeLifecycle -shuffle=on; go test ./internal/server -run 'TestWorkspaceRuntime|TestWorkspaceDelete' -shuffle=on; go test ./internal/workspace -shuffle=on; git diff --cached --check. Full go test ./internal/workspace ./internal/server -shuffle=on was attempted earlier and hit unrelated host process pressure in tmux process-leak stress coverage.

Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (5cd3626)

High severity regression found in workspace deletion when runtime lifecycle is disabled.

High

  • internal/server/huma_routes.go:3295 - deleteWorkspace now returns 503 when s.runtimeLifecycle == nil, which breaks workspace deletion for users without runtime configured. Previously, deletion still proceeded by skipping runtime shutdown and calling s.workspaces.Delete.

    Suggested fix: remove the unconditional nil rejection. If s.runtimeLifecycle is present, use s.runtimeLifecycle.DeleteWorkspace; otherwise fall back to s.workspaces.Delete directly with no runtime stop step.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Roborev caught that the lifecycle refactor made workspace deletion return 503 when runtime lifecycle support was absent. Preserve the previous behavior by falling back to workspace deletion without a runtime stop step when no lifecycle coordinator is configured.

Validation: go test ./internal/server -run 'TestWorkspaceDelete(FallsBackWhenRuntimeLifecycleNilE2E|StopsRuntimeSessionsE2E|DirtyKeepsRuntimeSessionsE2E)' -shuffle=on; git diff --check.

Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (c18024d)

Medium: workspace deletion fallback can skip runtime cleanup when runtimeLifecycle is nil.

Medium

  • internal/server/huma_routes.go:3320
    When runtimeLifecycle is nil but s.runtime is configured, workspace deletion falls back to s.workspaces.Delete(..., nil) and skips stopping runtime sessions. This can leave launched shells or agents alive after the worktree and DB row are removed.
    Fix: In the fallback path, preserve the legacy runtime cleanup when s.runtime != nil: wrap deletion with BeginStopping / EndStopping and pass a beforeDestructive callback that calls StopWorkspace.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The delete fallback for servers without RuntimeLifecycle now keeps the legacy stopping marker and StopWorkspace hook when a runtime manager exists. This avoids deleting workspace records while shell or agent sessions keep running.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (afe1f3f)

Summary: One medium-risk issue needs attention before merge.

Medium

  • internal/server/huma_routes.go:3330 - The fallback delete path passes beforeDestructive as nil when s.runtime is nil. The previous code always passed a non-nil callback, so if s.workspaces.Delete unconditionally invokes the hook, deleting a workspace with runtime disabled can panic.

    Fix: Always pass a no-op callback when there is no runtime manager, or update workspaces.Delete to explicitly tolerate a nil hook.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The retry-ready test now checks that no second setup attempt is recorded instead of racing on total setup-event count, and runtime delete tests wait for launched sessions to be observable before deleting.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (127d890)

Clean review: no Medium, High, or Critical findings were reported by any agent.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@mariusvniekerk mariusvniekerk marked this pull request as draft May 1, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant