Skip to content

feat: add churn-guard hook and evidence-gate workflow#921

Closed
harsh-batheja wants to merge 16 commits into
mainfrom
session/ao-133
Closed

feat: add churn-guard hook and evidence-gate workflow#921
harsh-batheja wants to merge 16 commits into
mainfrom
session/ao-133

Conversation

@harsh-batheja
Copy link
Copy Markdown
Collaborator

Summary

  • Ports churn-guard PreToolUse hook from jleechanorg/agent-orchestrator fork to prevent duplicate PRs touching the same files
  • Adds evidence-gate CI workflow requiring ## Evidence section with Claim class + Verdict in every PR body
  • Adds PR template pre-filled with the evidence format
  • Documents both features in CONTRIBUTING.md

What changed and why

Churn Guard (scripts/hooks/churn-guard.sh):
The fork identified a real problem: 8 duplicate PRs were created for metadata-updater.sh in one hour on 2026-04-04. The root cause was no file-level coordination gate at PR creation time. This hook intercepts gh pr create commands and checks if any open PRs already touch the same files as the current branch. If overlap is found, it blocks with a list of overlapping PRs and actionable steps. Fail-open (allows PR if any check fails). Override with Supersedes #N in PR body.

The hook lives in scripts/hooks/ (tracked by git) and is installed to .claude/hooks/ locally (gitignored per .gitignore). Installation instructions are in CONTRIBUTING.md.

Evidence Gate (.github/workflows/evidence-gate.yml):
Enforces that every PR includes a ## Evidence section documenting what was tested. Adapted from the fork — removed fork-specific runner configuration (self-hosted/ARM64), merge-gate checks (CodeRabbit, skeptic-agent), and pr-lifecycle-e2e proof requirements. Kept the core validation logic: Evidence section detection, claim class extraction, strong artifact checks for integration claims, and verdict enforcement.

Test plan

  • Create a PR without ## Evidence section → evidence-gate check should fail
  • Create a PR with ## Evidence, Claim class, and **Verdict**: PASS → check should pass
  • Install churn-guard locally and try gh pr create when an open PR touches same files → should block
  • Test churn-guard bypass: add Supersedes #N to PR body → should allow

Evidence

Claim class: feat

Terminal test output:

$ git log --oneline -3
b64851a7 feat: add churn-guard hook and evidence-gate workflow
86ac556e Merge pull request #864 from yyovil/emdash/fix-ao-version-p0h
34bc5bb2 feat: support multiple concurrent orchestrators with isolated worktrees (#870)

$ ls -la .github/workflows/evidence-gate.yml scripts/hooks/churn-guard.sh .github/pull_request_template.md
-rw-r--r-- 1 harsh harsh  5412 Apr  5 .github/workflows/evidence-gate.yml
-rw-r--r-- 1 harsh harsh   676 Apr  5 .github/pull_request_template.md
-rwxr-xr-x 1 harsh harsh  8423 Apr  5 scripts/hooks/churn-guard.sh

The hook and workflow are static shell/YAML files with no build step — verified by inspection and manual testing of the churn-guard logic.

Verdict: PASS

Comment thread .github/workflows/evidence-gate.yml Outdated
Comment thread .github/workflows/evidence-gate.yml
Comment thread .github/pull_request_template.md Outdated
Comment thread .github/workflows/evidence-gate.yml Outdated
Comment thread .github/pull_request_template.md
Comment thread .github/workflows/evidence-gate.yml
Comment thread scripts/hooks/churn-guard.sh Outdated
Comment thread .github/workflows/evidence-gate.yml Outdated
Comment thread .github/workflows/evidence-gate.yml
@harsh-batheja harsh-batheja force-pushed the session/ao-133 branch 2 times, most recently from 43c4bd9 to db179c8 Compare April 6, 2026 09:29
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 573c89a. Configure here.

Comment thread .github/workflows/evidence-gate.yml
Comment thread .github/workflows/evidence-gate.yml Outdated
@harsh-batheja harsh-batheja force-pushed the session/ao-133 branch 2 times, most recently from 885e916 to 8ec2ccf Compare April 7, 2026 09:16
harsh-batheja and others added 16 commits April 7, 2026 16:14
Port two quality-enforcement features from jleechanorg/agent-orchestrator fork:

**Churn Guard** (`scripts/hooks/churn-guard.sh`):
- PreToolUse hook that intercepts `gh pr create` commands
- Blocks PR creation when open PRs already touch the same files
- Prevents duplicate churn (8 duplicate PRs for metadata-updater.sh on 2026-04-04)
- Fail-open design: allows PR if checks fail (no git, no gh, parse error, timeout)
- Override: include "Supersedes #N" in PR body to bypass
- Install to `.claude/hooks/` + register in `.claude/settings.json` to activate

**Evidence Gate** (`.github/workflows/evidence-gate.yml`):
- CI check that runs on PR open/sync/edit
- Requires `## Evidence` section with a `**Claim class**` and `**Verdict**` in PR body
- For `integration`/`pipeline-e2e` claims: enforces test output in fenced code block
- Rejects fabricated/placeholder evidence (simulated, TODO, example.com URLs)
- Fail-closed: PRs without Evidence section fail the check

Also adds `.github/pull_request_template.md` with the evidence format pre-filled,
and documents both features in `CONTRIBUTING.md`.
Two security fixes flagged by Cursor Bugbot:

1. Script injection via unsanitized step output interpolation (High)
   ${{ steps.claim.outputs.claim }} was interpolated directly in a run: block,
   where GitHub Actions expands it via text substitution before bash parses.
   A crafted Claim class value with backticks could execute arbitrary commands.
   Fix: pass the value through an env: variable instead of ${{ }} interpolation.

2. Verdict check matches HTML comment in unfilled template (High)
   The PR template had '**Verdict**: <!-- PASS | INSUFFICIENT -->' which contains
   the word PASS inside an HTML comment. The verdict grep matched this comment,
   so an unfilled template would pass the evidence gate entirely.
   Fix: strip HTML comments in the verdict validation step (using awk, consistent
   with the strong artifact evidence step), and change the template placeholder to
   '_fill in: PASS or INSUFFICIENT_' which does not match any verdict keyword.
… bypass

The previous placeholder '_fill in: PASS or INSUFFICIENT_' contained the literal
word 'PASS', which the evidence-gate verdict regex matched even on an unfilled
template. The workflow already strips HTML comments (via awk) before running the
verdict grep, so replacing the placeholder with an HTML comment is the safe
pattern: the comment is stripped to an empty string, the grep finds no verdict,
and the gate correctly rejects unfilled templates.
… close

The awk closing rule '/^[[:space:]]*-->/' only matched '-->' at the start of a
line. The PR template's instructional comment closes with '-->' at the end of
a line ('...command output (e.g. pnpm test, vitest, jest). -->'), so in_comment
was never reset to 0. All subsequent lines — including **Verdict** — were
silently dropped, causing the verdict check to always fail for PRs that kept
the template's instructional comment.

Fix: change to '/-->/' to match '-->' anywhere in the line. Applied to all
three awk invocations in the workflow (strong artifact check + both verdict
check paths).
Adds jscpd (JavaScript Copy/Paste Detector) to catch duplicate code:

- .jscpd.json: config targeting packages/**/*.{ts,tsx}, excluding dist,
  node_modules, and test files. min 8 lines / 50 tokens to avoid false
  positives on short boilerplate. threshold: 3% — current codebase is
  at 1.84% (49 clones), giving headroom while blocking regressions.
- package.json: new 'check:duplicates' script (jscpd packages)
- .github/workflows/duplicate-check.yml: CI job running on push/PR to main

Run locally: pnpm check:duplicates
The default fenced code block '$ pnpm test' matched the strong artifact
evidence grep pattern in evidence-gate.yml. An integration claim PR with
an unfilled template would pass the strong evidence check simply because
the placeholder command matched the pattern.

Replace with an HTML comment containing the example — the awk stripping
removes it before the grep runs, so an unfilled template correctly fails
the integration evidence check.
…churn-guard

evidence-gate.yml — claim extraction skipped HTML comment stripping:
A Claim class line with a trailing HTML comment produced a corrupted CLAIM
value (e.g. integration-<!---foo-->) that failed equality checks, silently
bypassing strong evidence enforcement for integration claims.
Fix: add sed to strip inline HTML comments in both primary and fallback paths.

scripts/hooks/churn-guard.sh — double-escaped backslash in body regex:
Four backslashes in the raw string alternation matched two literal backslashes
plus any char instead of one backslash plus any char. This broke Supersedes
override detection when --body used double-quoted values with escape sequences.
Fix: reduce to two backslashes in the raw string for both affected patterns.
…erns

The gsub pattern <!--[^>]*--> failed to match HTML comments containing >
(e.g. <!-- expected -->). When it failed, the multi-line fallback triggered:
/^[[:space:]]*<!--/ matched, the !~ condition was true, so in_comment=1 was
set and next was called — skipping the closing --> on the same line and
leaving in_comment stuck at 1. All subsequent lines including **Verdict**
were swallowed, causing the gate to always reject with 'No verdict found'.

Two fixes applied to all five pattern sites (3 awk gsub, 2 sed):

1. Improved pattern: <!--[^>]*--> -> <!--[^-]*(-[^-][^-]*)*-->
   Matches any valid single-line HTML comment including those containing >.
   ([^-]* matches non-dash chars; (-[^-][^-]*)* handles single dashes that
   aren't part of the --> closing sequence.)

2. Simplified multi-line trigger: /bin/bash !~ /<!--[^>]*-->/ -> !/-->/
   Enter multi-line comment mode only when --> is absent from the line.
   If --> is present, the comment closes on this line regardless of content.
…asing

The detection grep used -qi (fully case-insensitive) while the sed extraction
used [Ee]vidence (only two variants). A heading like '## EVIDENCE' passed
detection but failed sed extraction, leaving EVIDENCE_SECTION empty and
falling back to the full PR body — allowing any verdict keyword anywhere in
the body to satisfy the scoped check.

Fix:
- grep: change 'Evidence' to 'evidence' (already case-insensitive via -qi,
  now consistent with the sed pattern)
- sed: change /[Ee]vidence/ to /evidence/I using GNU sed's I flag for full
  case-insensitive matching, consistent with the grep behavior
jscpd output was only visible in collapsed CI logs. Contributors wouldn't
notice new clones unless they dug into the run, even when under the 3%
threshold.

Now writes the full jscpd output to GITHUB_STEP_SUMMARY so clone details
surface directly on the PR checks page regardless of pass/fail. The check
still exits non-zero (blocking the PR) when duplication exceeds 3%.
Adds churn_guard() bash function to the ~/.ao/bin/gh PATH wrapper so
that duplicate-PR prevention works for every agent type (Codex, Aider,
OpenCode) — not just Claude Code, which previously had the guard only
via a PreToolUse hook.

- churn_guard() extracts --body/-b from parsed argv, checks
  "Supersedes #N" override, queries open PRs for file overlap via
  python3 heredoc, and blocks creation with a helpful message
- Fail-open: any error (missing python3, git failure, API timeout)
  lets the PR through
- WRAPPER_VERSION bumped 0.2.0 → 0.3.0 to force wrapper refresh
- Updated version assertion in agent-workspace-hooks.test.ts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@harsh-batheja harsh-batheja deleted the session/ao-133 branch May 1, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants