Skip to content

Build the EvalOps Feedback Operating System #45

@haasonsaas

Description

@haasonsaas

Summary

Build a cross-repo feedback operating system that ingests PR review feedback, normalizes it into a shared ledger, clusters recurring issue classes, converts high-confidence classes into repo-local guardrails, and reports whether EvalOps repos are getting safer over time.

Current spine already shipped

Phases

  1. Data spine

    • Capture review threads, top-level comments, review bodies, and CI/app feedback into one schema.
    • Preserve source links, repo/PR metadata, severity, author/app, state, path, and normalized class.
  2. Backfill and taxonomy

    • Run 30-60 day backfills across EvalOps.
    • Cluster recurring classes such as runtime smoke gaps, workflow shell footguns, generated contract drift, release train drift, configuration safety, auth/security gaps, docs drift, and missing regression coverage.
  3. Guardrail adapters

    • Platform: generated contract drift, SDK/changelog coverage, runtime evidence and replay/idempotency regressions.
    • Deploy: workflow shell safety, release-train desired-state drift, k8s/Terraform render invariants.
    • Maestro: upstream parity, MCP/prompt/tool contracts, replay/fuzz coverage.
    • .github: org-level reporting, duplicate avoidance, and backlog routing.
  4. Automation loop

    • Open or update repo-scoped issues for recurring classes.
    • Link all originating review comments.
    • Generate acceptance criteria and guardrail suggestions.
    • Track whether the prevention PR landed.
  5. Operator surface

    • Produce a weekly Slack or GitHub summary with top recurring classes, repos with rising feedback, newly prevented classes, stale unresolved feedback, and next guardrail candidates.

Acceptance criteria

  • Every merged EvalOps PR with high+ unresolved meaningful feedback is discoverable from a ledger artifact.
  • The ranked backlog produces stable JSON and markdown outputs from at least a 30-day backfill.
  • At least 10 recurring classes have repo-local guardrails with tests and CI wiring.
  • Platform, Deploy, Maestro, and .github consume either the ledger or derived backlog/report.
  • Weekly reporting identifies the next guardrail candidates automatically.
  • The top recurring issue classes show a measurable repeat-rate drop over a two-week window.

Immediate next slices

  • Add scheduled 30-day backfill artifact publishing separate from the six-hour sentinel.
  • Turn the current runtime-smoke-coverage finding from Platform #1676 into a repo-local regression or preflight guard.
  • Turn the current workflow-shell-footgun finding from Deploy #2382 into a workflow guardrail or actionlint extension.
  • Add duplicate-aware issue routing from guardrail backlog classes into repo-specific issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions