Summary
Build a cross-repo feedback operating system that ingests PR review feedback, normalizes it into a shared ledger, clusters recurring issue classes, converts high-confidence classes into repo-local guardrails, and reports whether EvalOps repos are getting safer over time.
Current spine already shipped
Phases
-
Data spine
- Capture review threads, top-level comments, review bodies, and CI/app feedback into one schema.
- Preserve source links, repo/PR metadata, severity, author/app, state, path, and normalized class.
-
Backfill and taxonomy
- Run 30-60 day backfills across EvalOps.
- Cluster recurring classes such as runtime smoke gaps, workflow shell footguns, generated contract drift, release train drift, configuration safety, auth/security gaps, docs drift, and missing regression coverage.
-
Guardrail adapters
- Platform: generated contract drift, SDK/changelog coverage, runtime evidence and replay/idempotency regressions.
- Deploy: workflow shell safety, release-train desired-state drift, k8s/Terraform render invariants.
- Maestro: upstream parity, MCP/prompt/tool contracts, replay/fuzz coverage.
- .github: org-level reporting, duplicate avoidance, and backlog routing.
-
Automation loop
- Open or update repo-scoped issues for recurring classes.
- Link all originating review comments.
- Generate acceptance criteria and guardrail suggestions.
- Track whether the prevention PR landed.
-
Operator surface
- Produce a weekly Slack or GitHub summary with top recurring classes, repos with rising feedback, newly prevented classes, stale unresolved feedback, and next guardrail candidates.
Acceptance criteria
- Every merged EvalOps PR with high+ unresolved meaningful feedback is discoverable from a ledger artifact.
- The ranked backlog produces stable JSON and markdown outputs from at least a 30-day backfill.
- At least 10 recurring classes have repo-local guardrails with tests and CI wiring.
- Platform, Deploy, Maestro, and .github consume either the ledger or derived backlog/report.
- Weekly reporting identifies the next guardrail candidates automatically.
- The top recurring issue classes show a measurable repeat-rate drop over a two-week window.
Immediate next slices
- Add scheduled 30-day backfill artifact publishing separate from the six-hour sentinel.
- Turn the current
runtime-smoke-coverage finding from Platform #1676 into a repo-local regression or preflight guard.
- Turn the current
workflow-shell-footgun finding from Deploy #2382 into a workflow guardrail or actionlint extension.
- Add duplicate-aware issue routing from guardrail backlog classes into repo-specific issues.
Summary
Build a cross-repo feedback operating system that ingests PR review feedback, normalizes it into a shared ledger, clusters recurring issue classes, converts high-confidence classes into repo-local guardrails, and reports whether EvalOps repos are getting safer over time.
Current spine already shipped
review-feedback-ledger.jsonpr_limitPhases
Data spine
Backfill and taxonomy
Guardrail adapters
Automation loop
Operator surface
Acceptance criteria
Immediate next slices
runtime-smoke-coveragefinding from Platform #1676 into a repo-local regression or preflight guard.workflow-shell-footgunfinding from Deploy #2382 into a workflow guardrail or actionlint extension.