-
Notifications
You must be signed in to change notification settings - Fork 945
feat: add churn-guard hook and evidence-gate workflow #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
a9b2bef
feat: add churn-guard hook and evidence-gate workflow
harsh-batheja 958d917
fix(evidence-gate): prevent script injection and template verdict bypass
harsh-batheja 7f8d13b
fix(pr-template): use HTML comment for verdict placeholder to prevent…
harsh-batheja 0790edc
fix(evidence-gate): match --> anywhere in line for multi-line comment…
harsh-batheja 5aaf838
feat: add duplicate code check with jscpd
harsh-batheja e11f6b4
fix(pr-template): move terminal test output example into HTML comment
harsh-batheja 509990d
fix: strip HTML comments in claim extraction; fix backslash regex in …
harsh-batheja fe78cd1
fix(evidence-gate): handle > inside HTML comments in all awk/sed patt…
harsh-batheja 4a17e38
fix(evidence-gate): align evidence heading detection and extraction c…
harsh-batheja 7ca3163
feat(duplicate-check): write clone report to job summary
harsh-batheja 69d5a63
feat(core): embed churn-guard in gh PATH wrapper for all agents
harsh-batheja f6ad6cb
fix(core): make churn-guard non-blocking (warn instead of block)
harsh-batheja 4d44bd1
test(agent-codex): update wrapper version and case assertions for 0.3.0
harsh-batheja f10dbd7
fix(evidence-gate): make all checks non-blocking (warn instead of fail)
harsh-batheja 4e19c31
fix(evidence-gate): move PASS message into else branch to avoid contr…
harsh-batheja 5a33fe3
fix(evidence-gate): enforce evidence gate failures consistently
harsh-batheja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| ## Summary | ||
|
|
||
| <!-- What changed and why? 1–3 bullet points. --> | ||
|
|
||
| - | ||
| - | ||
|
|
||
| Closes #<!-- issue number --> | ||
|
|
||
| ## Test plan | ||
|
|
||
| <!-- How was this tested? What should reviewers check? --> | ||
|
|
||
| - [ ] | ||
|
|
||
| ## Evidence | ||
|
|
||
| **Claim class**: <!-- unit | integration | feat | fix | refactor | docs | chore --> | ||
|
|
||
| <!-- Describe what was tested and how. For `integration` and `pipeline-e2e` claims, | ||
| include a **Terminal test output** section with a fenced code block showing test | ||
| command output (e.g. pnpm test, vitest, jest). --> | ||
|
|
||
| **Terminal test output**: | ||
|
|
||
| <!-- | ||
| Paste your actual test output in a fenced code block, e.g.: | ||
| ``` | ||
| $ pnpm test --filter my-package | ||
| ✓ all tests passed | ||
| ``` | ||
| --> | ||
|
|
||
| **Verdict**: <!-- replace this comment with PASS or INSUFFICIENT --> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| name: Duplicate Code Check | ||
|
|
||
| on: | ||
| push: | ||
| branches: [main] | ||
| pull_request: | ||
| branches: [main] | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| duplicate-check: | ||
| name: Duplicate Code Check | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: pnpm/action-setup@v4 | ||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: 20 | ||
| cache: pnpm | ||
| - run: pnpm install --frozen-lockfile | ||
|
|
||
| - name: Run jscpd and capture output | ||
| id: jscpd | ||
| # Always run (don't stop on non-zero exit) so we can write the summary first. | ||
| # The exit code is preserved in steps.jscpd.outputs.exit_code and re-applied after. | ||
| run: | | ||
| set +e | ||
| OUTPUT=$(pnpm check:duplicates 2>&1) | ||
| EXIT_CODE=$? | ||
| set -e | ||
|
|
||
| echo "exit_code=$EXIT_CODE" >> "$GITHUB_OUTPUT" | ||
|
|
||
| # Write job summary so clone details surface on the PR checks page | ||
| { | ||
| if [ $EXIT_CODE -ne 0 ]; then | ||
| echo "## ❌ Duplicate Code Check — threshold exceeded" | ||
| else | ||
| echo "## ✅ Duplicate Code Check — within threshold" | ||
| fi | ||
| echo "" | ||
| echo '```' | ||
| echo "$OUTPUT" | ||
| echo '```' | ||
| echo "" | ||
| echo "_Threshold: 3% of lines. Clones shown above even when passing._" | ||
| echo "_To suppress a false positive, add the files to \`.jscpd.json\` ignore list._" | ||
| } >> "$GITHUB_STEP_SUMMARY" | ||
|
|
||
| # Re-apply the original exit code so the step fails when threshold is crossed | ||
| exit $EXIT_CODE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,210 @@ | ||
| name: Evidence Gate | ||
|
|
||
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, edited, reopened] | ||
|
|
||
| # Do not cancel in-progress runs: a second event (e.g. push + bot PR-body edit) | ||
| # would cancel the first job; GitHub surfaces that as a failed check on the same SHA. | ||
| concurrency: | ||
| group: evidence-gate-${{ github.event.pull_request.number }} | ||
| cancel-in-progress: false | ||
|
|
||
| permissions: | ||
| pull-requests: read | ||
|
|
||
| jobs: | ||
| evidence-gate: | ||
| name: Evidence Gate | ||
| runs-on: ubuntu-latest | ||
| # Skip entirely when PR is merged or closed — a merged PR stops receiving | ||
| # pull_request events so a stale failed check run cannot be overwritten. | ||
| # Evidence gate is a pre-merge gate; post-merge it has no function. | ||
| if: github.event.pull_request.merged == false && github.event.action != 'closed' | ||
| steps: | ||
| - uses: actions/checkout@v4.1.1 | ||
|
|
||
| - name: Write PR body to temp file | ||
| env: | ||
| PR_BODY: ${{ github.event.pull_request.body }} | ||
| run: | | ||
| printf '%s' "$PR_BODY" > "$RUNNER_TEMP/pr_body.txt" | ||
| echo "Body fetched: ${#PR_BODY} chars" | ||
| if [ ${#PR_BODY} -eq 0 ]; then | ||
| echo "PR body is empty — treating as no evidence bundle" | ||
| echo "skip=true" >> "$GITHUB_OUTPUT" | ||
| fi | ||
| id: write_body | ||
|
|
||
| - name: Check for evidence bundle in PR body | ||
| id: check | ||
| run: | | ||
| if [ "${{ steps.write_body.outputs.skip }}" = "true" ]; then | ||
| echo "found=false" >> "$GITHUB_OUTPUT" | ||
| exit 0 | ||
| fi | ||
|
|
||
| BODY=$(cat "$RUNNER_TEMP/pr_body.txt") | ||
| STRIPPED_BODY=$(printf '%s' "$BODY" | python3 -c 'import re, sys; sys.stdout.write(re.sub(r"<!--.*?-->", "", sys.stdin.read(), flags=re.S))') | ||
|
|
||
| if printf '%s' "$STRIPPED_BODY" | grep -qi '^[[:space:]]*## evidence'; then | ||
| echo "found=true" >> "$GITHUB_OUTPUT" | ||
| else | ||
| echo "found=false" >> "$GITHUB_OUTPUT" | ||
| fi | ||
|
|
||
| - name: Warn when no Evidence section found | ||
| if: steps.check.outputs.found == 'false' | ||
| run: | | ||
| echo "WARNING: No ## Evidence section found in PR body." | ||
| echo "" | ||
| echo "Consider adding an Evidence section to your PR body:" | ||
| echo "" | ||
| echo " ## Evidence" | ||
| echo " **Claim class**: unit | integration | feat | fix | refactor | docs | chore" | ||
| echo " **Verdict**: PASS | INSUFFICIENT" | ||
| echo "" | ||
| echo " Describe what was tested and how." | ||
| exit 1 | ||
|
|
||
|
harsh-batheja marked this conversation as resolved.
|
||
| - name: Extract and normalize claim class | ||
| id: claim | ||
| if: steps.check.outputs.found == 'true' | ||
| run: | | ||
| BODY=$(cat "$RUNNER_TEMP/pr_body.txt") | ||
| STRIPPED_BODY=$(printf '%s' "$BODY" | python3 -c 'import re, sys; sys.stdout.write(re.sub(r"<!--.*?-->", "", sys.stdin.read(), flags=re.S))') | ||
|
|
||
| # Extract claim class — normalize to lowercase short form. | ||
| # Supports: **Claim class**: value (colon inside bold, space after colon) | ||
| # and list-bullet format "- **Claim class**: ...". | ||
| CLAIM=$(printf '%s' "$STRIPPED_BODY" \ | ||
| | grep -i '\*\*Claim class' | grep -v '^- ' | head -1 \ | ||
| | tr -d '*' \ | ||
| | sed 's/.*Claim class: *//I' \ | ||
| | sed 's/(.*//' \ | ||
| | tr '[:upper:]' '[:lower:]' \ | ||
| | tr ' ' '-' | tr -d '\t' \ | ||
| | sed 's/^[ \t-]*//;s/[ \t-]*$//') | ||
|
|
||
| # Fallback: list-bullet format | ||
| if [ -z "$CLAIM" ]; then | ||
| CLAIM=$(printf '%s' "$STRIPPED_BODY" \ | ||
| | grep -i '^-.*\*\*Claim class' | head -1 \ | ||
| | tr -d '*' \ | ||
| | sed 's/.*Claim class: *//I' \ | ||
| | sed 's/(.*//' \ | ||
| | tr '[:upper:]' '[:lower:]' \ | ||
| | tr ' ' '-' | tr -d '\t' \ | ||
| | sed 's/^[ \t-]*//;s/[ \t-]*$//') | ||
| fi | ||
|
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| # Normalize long forms to canonical short forms | ||
| case "$CLAIM" in | ||
| unit-test-coverage|unit-test) CLAIM="unit" ;; | ||
| integration-test) CLAIM="integration" ;; | ||
| bug-fix) CLAIM="fix" ;; | ||
| feature) CLAIM="feat" ;; | ||
| esac | ||
|
|
||
| echo "Claim class: $CLAIM" | ||
| echo "claim=$CLAIM" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Enforce strong artifact evidence for integration claims | ||
| if: steps.check.outputs.found == 'true' | ||
| env: | ||
| CLAIM: ${{ steps.claim.outputs.claim }} | ||
| run: | | ||
| BODY=$(cat "$RUNNER_TEMP/pr_body.txt") | ||
| STRIPPED_BODY=$(printf '%s' "$BODY" | python3 -c 'import re, sys; sys.stdout.write(re.sub(r"<!--.*?-->", "", sys.stdin.read(), flags=re.S))') | ||
|
|
||
| # integration + pipeline-e2e claims require rich artifacts; unit/fix/feat/docs/chore can use terminal evidence only. | ||
| if [ "$CLAIM" = "integration" ] || [ "$CLAIM" = "pipeline-e2e" ]; then | ||
| # Strip HTML comments before scanning for Evidence section. | ||
| EVIDENCE=$(printf '%s\n' "$STRIPPED_BODY" | awk ' | ||
| tolower($0) ~ /^[[:space:]]*##[[:space:]]+evidence([[:space:]]|$)/ { in_section=1; next } | ||
| in_section && /^[[:space:]]*##[[:space:]]/ { exit } | ||
| in_section { print } | ||
| ') | ||
|
|
||
| # Warn on fabricated/placeholder evidence | ||
| if printf '%s' "$EVIDENCE" | grep -qiE '\bsimulated\b'; then | ||
| echo "WARNING: Evidence contains 'simulated' — fabricated output is not valid." | ||
| fi | ||
| if printf '%s' "$EVIDENCE" | grep -qiE 'https?://(www\.)?example\.com'; then | ||
| echo "WARNING: Evidence contains example.com placeholder URL — use a real screenshot URL." | ||
| fi | ||
| if printf '%s' "$EVIDENCE" | grep -qiE '<screenshot[[:space:]]path>|<path>|<value>|\bTODO\b|\bTBD\b'; then | ||
| echo "WARNING: Evidence contains placeholder template text — fill in real values." | ||
| fi | ||
|
|
||
| # Terminal/test output: fenced code block with a concrete test-command keyword. | ||
| HAS_OUTPUT=false | ||
| TTO_BLOCK=$(printf '%s' "$EVIDENCE" | awk ' | ||
| /\*\*Terminal test output(\*\*)?:/ { show=1 } | ||
| show && /\*\*UI media(\*\*)?:/ { exit } | ||
| show { print } | ||
| ') | ||
| if printf '%s' "$TTO_BLOCK" | grep -q '```'; then | ||
| BLOCK=$(printf '%s' "$TTO_BLOCK" | sed -n '/```/,/```/p' | tail -n +2 | sed '$d') | ||
| if printf '%s' "$BLOCK" | grep -qiE '\$[[:space:]]*(pnpm|npm|pytest|vitest|jest|go[[:space:]]+test)[[:space:]]'; then | ||
| HAS_OUTPUT=true | ||
| fi | ||
| fi | ||
| if [ "$HAS_OUTPUT" = "false" ] && printf '%s' "$TTO_BLOCK" | grep -qE '\*\*Terminal test output(\*\*)?:' && printf '%s' "$TTO_BLOCK" | grep -qE 'https://[^[:space:]]+'; then | ||
| HAS_OUTPUT=true | ||
| fi | ||
|
|
||
| if [ "$HAS_OUTPUT" != "true" ]; then | ||
| echo "WARNING: Strong evidence standard not met for claim class '$CLAIM'." | ||
| echo "Recommended for integration/pipeline-e2e claims:" | ||
| echo " - **Terminal test output**: fenced code block with a concrete test command" | ||
| echo " (e.g. \`\`\` block containing: pnpm test, npm test, vitest, jest, etc.)" | ||
| exit 1 | ||
| else | ||
| echo "Strong evidence standard PASS for $CLAIM" | ||
| fi | ||
| else | ||
| echo "Strong artifact check not required for claim class: $CLAIM" | ||
| fi | ||
|
|
||
| - name: Validate verdict is present and consistent | ||
| if: steps.check.outputs.found == 'true' | ||
| run: | | ||
| BODY=$(cat "$RUNNER_TEMP/pr_body.txt") | ||
| STRIPPED_BODY=$(printf '%s' "$BODY" | python3 -c 'import re, sys; sys.stdout.write(re.sub(r"<!--.*?-->", "", sys.stdin.read(), flags=re.S))') | ||
|
|
||
| # Extract from "## Evidence" to EOF to avoid false matches before the section, | ||
| # then strip HTML comments so template placeholders (<!-- PASS | INSUFFICIENT -->) | ||
| # cannot satisfy the verdict check on an unfilled template. | ||
| EVIDENCE_SECTION=$(printf '%s\n' "$STRIPPED_BODY" | awk ' | ||
| tolower($0) ~ /^[[:space:]]*##[[:space:]]+evidence([[:space:]]|$)/ { in_section=1; next } | ||
| in_section && /^[[:space:]]*##[[:space:]]/ { exit } | ||
| in_section { print } | ||
| ') | ||
|
|
||
| # Empty extraction means there was no real Evidence section after comment stripping. | ||
| if [ -z "$EVIDENCE_SECTION" ]; then | ||
| echo "WARNING: No ## Evidence section found in PR body after stripping HTML comments." | ||
| exit 1 | ||
| fi | ||
|
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| # Check for PASS verdict — scoped to Evidence section | ||
| if printf '%s' "$EVIDENCE_SECTION" | grep -qi '[Vv]erdict.*:.*[Pp][Aa][Ss][Ss]'; then | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| echo "Verdict: PASS — evidence gate passes" | ||
|
|
||
| # Check for INSUFFICIENT verdict — gate passes; used when evidence is partial | ||
| elif printf '%s' "$EVIDENCE_SECTION" | grep -qi '[Vv]erdict.*:.*[Ii][Nn][Ss][Uu][Ff][Ff][Ii][Cc][Ii][Ee][Nn][Tt]'; then | ||
| echo "Verdict: INSUFFICIENT — gate passes (marks work-in-progress evidence)" | ||
|
|
||
| # Check for FAIL verdict — gate passes with a warning | ||
| elif printf '%s' "$EVIDENCE_SECTION" | grep -qi '[Vv]erdict.*:.*[Ff][Aa][Ii][Ll]'; then | ||
| echo "Verdict: FAIL with present bundle — this bundle should be re-examined" | ||
|
|
||
| # No verdict found — warn | ||
| else | ||
| echo "WARNING: No verdict found in evidence bundle." | ||
| echo "Consider adding one of the following to your ## Evidence section:" | ||
| echo " **Verdict**: PASS" | ||
| echo " **Verdict**: INSUFFICIENT" | ||
| exit 1 | ||
| fi | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| { | ||
| "minLines": 8, | ||
| "minTokens": 50, | ||
| "threshold": 3, | ||
| "pattern": "**/*.{ts,tsx}", | ||
| "ignore": [ | ||
| "**/dist/**", | ||
| "**/node_modules/**", | ||
| "**/__tests__/**", | ||
| "**/*.test.ts", | ||
| "**/*.test.tsx", | ||
| "**/*.spec.ts", | ||
| "**/*.spec.tsx" | ||
| ], | ||
| "reporters": ["console"], | ||
| "gitignore": true | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.