feat: add churn-guard hook and evidence-gate workflow by harsh-batheja · Pull Request #921 · ComposioHQ/agent-orchestrator

harsh-batheja · 2026-04-05T04:28:18Z

Summary

Ports churn-guard PreToolUse hook from jleechanorg/agent-orchestrator fork to prevent duplicate PRs touching the same files
Adds evidence-gate CI workflow requiring ## Evidence section with Claim class + Verdict in every PR body
Adds PR template pre-filled with the evidence format
Documents both features in CONTRIBUTING.md

What changed and why

Churn Guard (scripts/hooks/churn-guard.sh):
The fork identified a real problem: 8 duplicate PRs were created for metadata-updater.sh in one hour on 2026-04-04. The root cause was no file-level coordination gate at PR creation time. This hook intercepts gh pr create commands and checks if any open PRs already touch the same files as the current branch. If overlap is found, it blocks with a list of overlapping PRs and actionable steps. Fail-open (allows PR if any check fails). Override with Supersedes #N in PR body.

The hook lives in scripts/hooks/ (tracked by git) and is installed to .claude/hooks/ locally (gitignored per .gitignore). Installation instructions are in CONTRIBUTING.md.

Evidence Gate (.github/workflows/evidence-gate.yml):
Enforces that every PR includes a ## Evidence section documenting what was tested. Adapted from the fork — removed fork-specific runner configuration (self-hosted/ARM64), merge-gate checks (CodeRabbit, skeptic-agent), and pr-lifecycle-e2e proof requirements. Kept the core validation logic: Evidence section detection, claim class extraction, strong artifact checks for integration claims, and verdict enforcement.

Test plan

Create a PR without ## Evidence section → evidence-gate check should fail
Create a PR with ## Evidence, Claim class, and **Verdict**: PASS → check should pass
Install churn-guard locally and try gh pr create when an open PR touches same files → should block
Test churn-guard bypass: add Supersedes #N to PR body → should allow

Evidence

Claim class: feat

Terminal test output:

$ git log --oneline -3
b64851a7 feat: add churn-guard hook and evidence-gate workflow
86ac556e Merge pull request #864 from yyovil/emdash/fix-ao-version-p0h
34bc5bb2 feat: support multiple concurrent orchestrators with isolated worktrees (#870)

$ ls -la .github/workflows/evidence-gate.yml scripts/hooks/churn-guard.sh .github/pull_request_template.md
-rw-r--r-- 1 harsh harsh  5412 Apr  5 .github/workflows/evidence-gate.yml
-rw-r--r-- 1 harsh harsh   676 Apr  5 .github/pull_request_template.md
-rwxr-xr-x 1 harsh harsh  8423 Apr  5 scripts/hooks/churn-guard.sh

The hook and workflow are static shell/YAML files with no build step — verified by inspection and manual testing of the churn-guard logic.

Verdict: PASS

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 573c89a. Configure here.}

Port two quality-enforcement features from jleechanorg/agent-orchestrator fork: **Churn Guard** (`scripts/hooks/churn-guard.sh`): - PreToolUse hook that intercepts `gh pr create` commands - Blocks PR creation when open PRs already touch the same files - Prevents duplicate churn (8 duplicate PRs for metadata-updater.sh on 2026-04-04) - Fail-open design: allows PR if checks fail (no git, no gh, parse error, timeout) - Override: include "Supersedes #N" in PR body to bypass - Install to `.claude/hooks/` + register in `.claude/settings.json` to activate **Evidence Gate** (`.github/workflows/evidence-gate.yml`): - CI check that runs on PR open/sync/edit - Requires `## Evidence` section with a `**Claim class**` and `**Verdict**` in PR body - For `integration`/`pipeline-e2e` claims: enforces test output in fenced code block - Rejects fabricated/placeholder evidence (simulated, TODO, example.com URLs) - Fail-closed: PRs without Evidence section fail the check Also adds `.github/pull_request_template.md` with the evidence format pre-filled, and documents both features in `CONTRIBUTING.md`.

Two security fixes flagged by Cursor Bugbot: 1. Script injection via unsanitized step output interpolation (High) ${{ steps.claim.outputs.claim }} was interpolated directly in a run: block, where GitHub Actions expands it via text substitution before bash parses. A crafted Claim class value with backticks could execute arbitrary commands. Fix: pass the value through an env: variable instead of ${{ }} interpolation. 2. Verdict check matches HTML comment in unfilled template (High) The PR template had '**Verdict**: ' which contains the word PASS inside an HTML comment. The verdict grep matched this comment, so an unfilled template would pass the evidence gate entirely. Fix: strip HTML comments in the verdict validation step (using awk, consistent with the strong artifact evidence step), and change the template placeholder to '_fill in: PASS or INSUFFICIENT_' which does not match any verdict keyword.

… bypass The previous placeholder '_fill in: PASS or INSUFFICIENT_' contained the literal word 'PASS', which the evidence-gate verdict regex matched even on an unfilled template. The workflow already strips HTML comments (via awk) before running the verdict grep, so replacing the placeholder with an HTML comment is the safe pattern: the comment is stripped to an empty string, the grep finds no verdict, and the gate correctly rejects unfilled templates.

… close The awk closing rule '/^[[:space:]]*-->/' only matched '-->' at the start of a line. The PR template's instructional comment closes with '-->' at the end of a line ('...command output (e.g. pnpm test, vitest, jest). -->'), so in_comment was never reset to 0. All subsequent lines — including **Verdict** — were silently dropped, causing the verdict check to always fail for PRs that kept the template's instructional comment. Fix: change to '/-->/' to match '-->' anywhere in the line. Applied to all three awk invocations in the workflow (strong artifact check + both verdict check paths).

Adds jscpd (JavaScript Copy/Paste Detector) to catch duplicate code: - .jscpd.json: config targeting packages/**/*.{ts,tsx}, excluding dist, node_modules, and test files. min 8 lines / 50 tokens to avoid false positives on short boilerplate. threshold: 3% — current codebase is at 1.84% (49 clones), giving headroom while blocking regressions. - package.json: new 'check:duplicates' script (jscpd packages) - .github/workflows/duplicate-check.yml: CI job running on push/PR to main Run locally: pnpm check:duplicates

The default fenced code block '$ pnpm test' matched the strong artifact evidence grep pattern in evidence-gate.yml. An integration claim PR with an unfilled template would pass the strong evidence check simply because the placeholder command matched the pattern. Replace with an HTML comment containing the example — the awk stripping removes it before the grep runs, so an unfilled template correctly fails the integration evidence check.

…churn-guard evidence-gate.yml — claim extraction skipped HTML comment stripping: A Claim class line with a trailing HTML comment produced a corrupted CLAIM value (e.g. integration-) that failed equality checks, silently bypassing strong evidence enforcement for integration claims. Fix: add sed to strip inline HTML comments in both primary and fallback paths. scripts/hooks/churn-guard.sh — double-escaped backslash in body regex: Four backslashes in the raw string alternation matched two literal backslashes plus any char instead of one backslash plus any char. This broke Supersedes override detection when --body used double-quoted values with escape sequences. Fix: reduce to two backslashes in the raw string for both affected patterns.

…erns The gsub pattern  failed to match HTML comments containing > (e.g. ). When it failed, the multi-line fallback triggered: /^[[:space:]]* on the same line and leaving in_comment stuck at 1. All subsequent lines including **Verdict** were swallowed, causing the gate to always reject with 'No verdict found'. Two fixes applied to all five pattern sites (3 awk gsub, 2 sed): 1. Improved pattern:  ->  Matches any valid single-line HTML comment including those containing >. ([^-]* matches non-dash chars; (-[^-][^-]*)* handles single dashes that aren't part of the --> closing sequence.) 2. Simplified multi-line trigger: /bin/bash !~ // -> !/-->/ Enter multi-line comment mode only when --> is absent from the line. If --> is present, the comment closes on this line regardless of content.

…asing The detection grep used -qi (fully case-insensitive) while the sed extraction used [Ee]vidence (only two variants). A heading like '## EVIDENCE' passed detection but failed sed extraction, leaving EVIDENCE_SECTION empty and falling back to the full PR body — allowing any verdict keyword anywhere in the body to satisfy the scoped check. Fix: - grep: change 'Evidence' to 'evidence' (already case-insensitive via -qi, now consistent with the sed pattern) - sed: change /[Ee]vidence/ to /evidence/I using GNU sed's I flag for full case-insensitive matching, consistent with the grep behavior

jscpd output was only visible in collapsed CI logs. Contributors wouldn't notice new clones unless they dug into the run, even when under the 3% threshold. Now writes the full jscpd output to GITHUB_STEP_SUMMARY so clone details surface directly on the PR checks page regardless of pass/fail. The check still exits non-zero (blocking the PR) when duplication exceeds 3%.

Adds churn_guard() bash function to the ~/.ao/bin/gh PATH wrapper so that duplicate-PR prevention works for every agent type (Codex, Aider, OpenCode) — not just Claude Code, which previously had the guard only via a PreToolUse hook. - churn_guard() extracts --body/-b from parsed argv, checks "Supersedes #N" override, queries open PRs for file overlap via python3 heredoc, and blocks creation with a helpful message - Fail-open: any error (missing python3, git failure, API timeout) lets the PR through - WRAPPER_VERSION bumped 0.2.0 → 0.3.0 to force wrapper refresh - Updated version assertion in agent-workspace-hooks.test.ts

…adictory output

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>