feat: add churn-guard hook and evidence-gate workflow#921
Closed
harsh-batheja wants to merge 16 commits into
Closed
feat: add churn-guard hook and evidence-gate workflow#921harsh-batheja wants to merge 16 commits into
harsh-batheja wants to merge 16 commits into
Conversation
43c4bd9 to
db179c8
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 573c89a. Configure here.
illegalcall
approved these changes
Apr 6, 2026
885e916 to
8ec2ccf
Compare
Port two quality-enforcement features from jleechanorg/agent-orchestrator fork: **Churn Guard** (`scripts/hooks/churn-guard.sh`): - PreToolUse hook that intercepts `gh pr create` commands - Blocks PR creation when open PRs already touch the same files - Prevents duplicate churn (8 duplicate PRs for metadata-updater.sh on 2026-04-04) - Fail-open design: allows PR if checks fail (no git, no gh, parse error, timeout) - Override: include "Supersedes #N" in PR body to bypass - Install to `.claude/hooks/` + register in `.claude/settings.json` to activate **Evidence Gate** (`.github/workflows/evidence-gate.yml`): - CI check that runs on PR open/sync/edit - Requires `## Evidence` section with a `**Claim class**` and `**Verdict**` in PR body - For `integration`/`pipeline-e2e` claims: enforces test output in fenced code block - Rejects fabricated/placeholder evidence (simulated, TODO, example.com URLs) - Fail-closed: PRs without Evidence section fail the check Also adds `.github/pull_request_template.md` with the evidence format pre-filled, and documents both features in `CONTRIBUTING.md`.
Two security fixes flagged by Cursor Bugbot:
1. Script injection via unsanitized step output interpolation (High)
${{ steps.claim.outputs.claim }} was interpolated directly in a run: block,
where GitHub Actions expands it via text substitution before bash parses.
A crafted Claim class value with backticks could execute arbitrary commands.
Fix: pass the value through an env: variable instead of ${{ }} interpolation.
2. Verdict check matches HTML comment in unfilled template (High)
The PR template had '**Verdict**: <!-- PASS | INSUFFICIENT -->' which contains
the word PASS inside an HTML comment. The verdict grep matched this comment,
so an unfilled template would pass the evidence gate entirely.
Fix: strip HTML comments in the verdict validation step (using awk, consistent
with the strong artifact evidence step), and change the template placeholder to
'_fill in: PASS or INSUFFICIENT_' which does not match any verdict keyword.
… bypass The previous placeholder '_fill in: PASS or INSUFFICIENT_' contained the literal word 'PASS', which the evidence-gate verdict regex matched even on an unfilled template. The workflow already strips HTML comments (via awk) before running the verdict grep, so replacing the placeholder with an HTML comment is the safe pattern: the comment is stripped to an empty string, the grep finds no verdict, and the gate correctly rejects unfilled templates.
… close
The awk closing rule '/^[[:space:]]*-->/' only matched '-->' at the start of a
line. The PR template's instructional comment closes with '-->' at the end of
a line ('...command output (e.g. pnpm test, vitest, jest). -->'), so in_comment
was never reset to 0. All subsequent lines — including **Verdict** — were
silently dropped, causing the verdict check to always fail for PRs that kept
the template's instructional comment.
Fix: change to '/-->/' to match '-->' anywhere in the line. Applied to all
three awk invocations in the workflow (strong artifact check + both verdict
check paths).
Adds jscpd (JavaScript Copy/Paste Detector) to catch duplicate code:
- .jscpd.json: config targeting packages/**/*.{ts,tsx}, excluding dist,
node_modules, and test files. min 8 lines / 50 tokens to avoid false
positives on short boilerplate. threshold: 3% — current codebase is
at 1.84% (49 clones), giving headroom while blocking regressions.
- package.json: new 'check:duplicates' script (jscpd packages)
- .github/workflows/duplicate-check.yml: CI job running on push/PR to main
Run locally: pnpm check:duplicates
The default fenced code block '$ pnpm test' matched the strong artifact evidence grep pattern in evidence-gate.yml. An integration claim PR with an unfilled template would pass the strong evidence check simply because the placeholder command matched the pattern. Replace with an HTML comment containing the example — the awk stripping removes it before the grep runs, so an unfilled template correctly fails the integration evidence check.
…churn-guard evidence-gate.yml — claim extraction skipped HTML comment stripping: A Claim class line with a trailing HTML comment produced a corrupted CLAIM value (e.g. integration-<!---foo-->) that failed equality checks, silently bypassing strong evidence enforcement for integration claims. Fix: add sed to strip inline HTML comments in both primary and fallback paths. scripts/hooks/churn-guard.sh — double-escaped backslash in body regex: Four backslashes in the raw string alternation matched two literal backslashes plus any char instead of one backslash plus any char. This broke Supersedes override detection when --body used double-quoted values with escape sequences. Fix: reduce to two backslashes in the raw string for both affected patterns.
…erns The gsub pattern <!--[^>]*--> failed to match HTML comments containing > (e.g. <!-- expected -->). When it failed, the multi-line fallback triggered: /^[[:space:]]*<!--/ matched, the !~ condition was true, so in_comment=1 was set and next was called — skipping the closing --> on the same line and leaving in_comment stuck at 1. All subsequent lines including **Verdict** were swallowed, causing the gate to always reject with 'No verdict found'. Two fixes applied to all five pattern sites (3 awk gsub, 2 sed): 1. Improved pattern: <!--[^>]*--> -> <!--[^-]*(-[^-][^-]*)*--> Matches any valid single-line HTML comment including those containing >. ([^-]* matches non-dash chars; (-[^-][^-]*)* handles single dashes that aren't part of the --> closing sequence.) 2. Simplified multi-line trigger: /bin/bash !~ /<!--[^>]*-->/ -> !/-->/ Enter multi-line comment mode only when --> is absent from the line. If --> is present, the comment closes on this line regardless of content.
…asing The detection grep used -qi (fully case-insensitive) while the sed extraction used [Ee]vidence (only two variants). A heading like '## EVIDENCE' passed detection but failed sed extraction, leaving EVIDENCE_SECTION empty and falling back to the full PR body — allowing any verdict keyword anywhere in the body to satisfy the scoped check. Fix: - grep: change 'Evidence' to 'evidence' (already case-insensitive via -qi, now consistent with the sed pattern) - sed: change /[Ee]vidence/ to /evidence/I using GNU sed's I flag for full case-insensitive matching, consistent with the grep behavior
jscpd output was only visible in collapsed CI logs. Contributors wouldn't notice new clones unless they dug into the run, even when under the 3% threshold. Now writes the full jscpd output to GITHUB_STEP_SUMMARY so clone details surface directly on the PR checks page regardless of pass/fail. The check still exits non-zero (blocking the PR) when duplication exceeds 3%.
Adds churn_guard() bash function to the ~/.ao/bin/gh PATH wrapper so that duplicate-PR prevention works for every agent type (Codex, Aider, OpenCode) — not just Claude Code, which previously had the guard only via a PreToolUse hook. - churn_guard() extracts --body/-b from parsed argv, checks "Supersedes #N" override, queries open PRs for file overlap via python3 heredoc, and blocks creation with a helpful message - Fail-open: any error (missing python3, git failure, API timeout) lets the PR through - WRAPPER_VERSION bumped 0.2.0 → 0.3.0 to force wrapper refresh - Updated version assertion in agent-workspace-hooks.test.ts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
8ec2ccf to
5a33fe3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
## Evidencesection with Claim class + Verdict in every PR bodyWhat changed and why
Churn Guard (
scripts/hooks/churn-guard.sh):The fork identified a real problem: 8 duplicate PRs were created for
metadata-updater.shin one hour on 2026-04-04. The root cause was no file-level coordination gate at PR creation time. This hook interceptsgh pr createcommands and checks if any open PRs already touch the same files as the current branch. If overlap is found, it blocks with a list of overlapping PRs and actionable steps. Fail-open (allows PR if any check fails). Override withSupersedes #Nin PR body.The hook lives in
scripts/hooks/(tracked by git) and is installed to.claude/hooks/locally (gitignored per.gitignore). Installation instructions are in CONTRIBUTING.md.Evidence Gate (
.github/workflows/evidence-gate.yml):Enforces that every PR includes a
## Evidencesection documenting what was tested. Adapted from the fork — removed fork-specific runner configuration (self-hosted/ARM64), merge-gate checks (CodeRabbit, skeptic-agent), and pr-lifecycle-e2e proof requirements. Kept the core validation logic: Evidence section detection, claim class extraction, strong artifact checks for integration claims, and verdict enforcement.Test plan
## Evidencesection → evidence-gate check should fail## Evidence, Claim class, and**Verdict**: PASS→ check should passgh pr createwhen an open PR touches same files → should blockSupersedes #Nto PR body → should allowEvidence
Claim class: feat
Terminal test output:
The hook and workflow are static shell/YAML files with no build step — verified by inspection and manual testing of the churn-guard logic.
Verdict: PASS