You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The repository has a comprehensive CI/CD setup with 44+ workflows: 13 static GitHub Actions workflows and 27+ agentic workflows. Core static PR checks are all healthy and passing. However, several scheduled and agentic workflows are currently failing, and notable gaps exist in coverage completeness, security tooling, and test mapping.
Recent run health (last 30 completed runs across all events):
✅ Success: 19 runs (~63%)
❌ Failure: 11 runs (~37%)
The failures are concentrated in agentic/scheduled workflows; static PR quality gates are passing.
✅ Existing Quality Gates
Static Workflows (run on every PR to main)
Workflow
Purpose
Scope
lint.yml
ESLint + MarkdownLint
Source code + docs
build.yml
TypeScript build + api-proxy/cli-proxy unit tests
Node 20 & 22 matrix
test-integration.yml
TypeScript type check (tsc --noEmit)
Node 22
codeql.yml
SAST scanning (javascript-typescript + actions)
Security
dependency-audit.yml
npm audit on main + docs-site; SARIF upload to Security tab
Supply chain
pr-title.yml
Semantic PR title validation (Conventional Commits)
Process
test-coverage.yml
Jest unit tests with coverage diff vs. base branch + PR comment
Quality
test-integration-suite.yml
5 parallel integration test jobs (domain, network, protocol, container ops, API proxy)
Functional
test-chroot.yml
4 parallel chroot integration test jobs (languages, pkgs, procfs, edge cases)
Functional
test-examples.yml
Runs all example shell scripts end-to-end
Functional
test-action.yml
Tests action.yml setup action (latest, specific version, image pull, invalid)
Functional
docs-preview.yml
Builds docs site and posts artifact link comment
Documentation
link-check.yml
Validates all Markdown links (triggered on .md changes only)
Weekly: Dependency Vulnerability Audit, CodeQL, CLI Flag Consistency Checker, Test Coverage Improver
Release: SBOM generation (SPDX) for all container images, signed SLSA provenance
🔍 Identified Gaps
🔴 High Priority
1. Seven Integration Test Files Not Mapped to Any CI Job
The following test files exist in tests/integration/ but are not matched by any pattern in test-integration-suite.yml or test-chroot.yml:
Test File
Description
api-target-allowlist.test.ts
API target domain allowlist enforcement
chroot-capsh-chain.test.ts
Capability chain in chroot (security-critical)
cli-proxy.test.ts
DIFC CLI proxy integration
gh-host-injection.test.ts
GitHub host injection attack resistance
ghes-auto-populate.test.ts
GitHub Enterprise Server auto-population
host-tcp-services.test.ts
Host TCP service access control
workdir-tmpfs-hiding.test.ts
Workdir tmpfs visibility from agent
Of these, chroot-capsh-chain.test.ts, gh-host-injection.test.ts, and workdir-tmpfs-hiding.test.ts are particularly concerning since they test security-critical isolation properties that the firewall depends on. A regression in these could be silently merged.
2. Critically Low Unit Test Coverage Thresholds
Current Jest thresholds in jest.config.js: statements 38%, branches 30%, functions 35%, lines 38%. The two most important source files have alarming coverage:
File
Statements
Functions
Lines
Risk
cli.ts
0% (0/69)
0% (0/10)
0% (0/69)
Entry point untested
docker-manager.ts
18% (45/250)
4% (1/25)
17% (41/239)
Core orchestration untested
These thresholds allow a PR to pass coverage checks while introducing completely untested code paths in the most critical files. A PR adding 50 lines to cli.ts could merge with 0% coverage and the gate won't catch it.
3. All Agentic Smoke Tests Currently Failing
All AI-agent smoke tests (smoke-claude, smoke-copilot, smoke-codex, smoke-services) are currently failing. While these may require real credentials to function, their sustained failure state means:
No end-to-end validation that the firewall correctly sandboxes real AI agents
The security-guard.md Claude review is also failing, leaving PRs without AI security review
4. Dependency Vulnerability Audit Currently Failing
The dependency-audit.yml run recently failed. Until this is resolved, PRs could be merging with known high/critical vulnerabilities.
🟡 Medium Priority
5. No Dockerfile or Shell Script Linting
The project has multiple Dockerfiles (containers/squid/, containers/agent/, containers/api-proxy/, containers/cli-proxy/) and numerous shell scripts (setup-iptables.sh, entrypoint.sh, cleanup.sh, etc.). None are linted in CI:
No hadolint for Dockerfile best practices (e.g., pinning base image digests, layer hygiene, non-root users)
No shellcheck for shell script correctness (e.g., unquoted variables, set -e missing, portability issues)
Shell script bugs in setup-iptables.sh or entrypoint.sh would not be caught until integration tests fail.
6. Performance Benchmarks Not PR-Gated
performance-monitor.yml runs daily on a schedule only. A PR introducing a startup time regression (e.g., adding a slow synchronous operation to cli.ts) would not be caught until the next day's run, after the PR is merged.
The benchmark infrastructure already exists with regression detection logic—it just needs to be triggered on PRs (or at least on push to main).
7. Custom ESLint Rules Not Tested in CI
package.json defines a test:lint-rules script (node eslint-rules/no-unsafe-execa.test.js) for testing custom ESLint rules. This script is not invoked from any CI workflow. A broken custom security rule (e.g., the no-unsafe-execa rule that guards against shell injection) would not be detected until code review catches the test failure locally.
8. No Container Image Vulnerability Scanning on PRs
SBOMs are generated at release time (using anchore/sbom-action), but there is no pre-merge container image vulnerability scan. A PR that upgrades a base image to one with known CVEs (e.g., updating ubuntu/squid:latest) would pass all CI checks and merge. Tools like Grype or Trivy can scan built container images and fail the CI if critical CVEs are introduced.
9. No External Coverage Reporting Integration
The LCOV report is generated during test-coverage.yml but not uploaded to any external service (Codecov, Coveralls). This means:
No coverage trend tracking across time
No coverage badge in README
No per-file coverage diff comments in PR UI (only the aggregated percentage is commented)
10. Documentation Build Not a Blocking PR Check
docs-preview.yml uses continue-on-error: true for the build step—meaning the workflow reports success even if the docs site fails to build. A PR breaking the docs site would pass CI and show the PR comment "Documentation build failed" but would not block the merge.
🟢 Low Priority
11. No Mutation Testing
The unit test suite passes but doesn't validate test quality. Mutation testing (e.g., Stryker Mutator) would detect if tests are actually verifying behavior or just achieving coverage by calling functions without assertions. With docker-manager.ts at 18% coverage, this is less urgent now but will matter as coverage improves.
12. Integration Test Containers Not Cached
Every integration test job runs docker build from scratch (no build cache, no layer cache). This adds 2–5 minutes per job and makes the parallel integration test suite slower than necessary. Using Docker BuildKit cache or pre-built images as a base would speed up CI feedback.
13. test:lint-rules and test:all Scripts Exist but Not Used in CI
package.json has both test:lint-rules and test:all (test:unit && test:integration) scripts that aren't referenced from any GitHub Actions workflow. The test:all script provides a unified entry point that would be useful for a pre-commit hook or a single combined CI job.
14. No Enforced Minimum Coverage Per File
While global coverage thresholds exist, there are no per-file minimums. A single file could drop to 0% coverage without triggering the global threshold (as demonstrated by cli.ts being at 0% while the suite still passes).
📋 Actionable Recommendations
High Priority
Gap
Recommendation
Complexity
Impact
7 uncovered integration tests
Add new CI jobs to test-integration-suite.yml for the 7 uncovered test files, grouped logically (e.g., test-security-isolation for capsh-chain, gh-host-injection, workdir-tmpfs-hiding)
Low
High
Low coverage thresholds
Raise thresholds to 60%/50%/60%/60% and add per-file minimums in jest.config.js for cli.ts and docker-manager.ts
Low
High
Failing smoke/security-guard
Investigate root cause; at minimum add health monitoring to create issues when these fail for more than N consecutive runs
Medium
High
Failing dependency audit
Fix the dependency audit failure (likely a high/critical vuln that needs patching or exception)
Low
High
Medium Priority
Gap
Recommendation
Complexity
Impact
No Dockerfile linting
Add hadolint step to build.yml scanning all containers/*/Dockerfile
Low
Medium
No shell script linting
Add shellcheck step to build.yml for containers/**/*.sh and scripts/**/*.sh
Low
Medium
Performance not PR-gated
Add a lightweight benchmark job (fewer iterations, e.g., 5 vs. 30) to build.yml triggered on push to main
Medium
Medium
Custom ESLint rules untested
Add npm run test:lint-rules step to lint.yml
Low
Medium
No container vuln scanning
Add a container-security-scan.yml using anchore/scan-action or aquasecurity/trivy-action on PRs that touch containers/**
Medium
Medium
Coverage not uploaded externally
Add codecov/codecov-action to test-coverage.yml with LCOV upload
Low
Medium
Docs build not blocking
Remove continue-on-error: true from docs build step in docs-preview.yml
Low
Medium
Low Priority
Gap
Recommendation
Complexity
Impact
No mutation testing
Introduce Stryker Mutator in a weekly scheduled workflow
High
Low
Container build caching
Add --cache-from / Docker BuildKit layer caching in integration test jobs
Medium
Low
Per-file coverage minimums
Add coverageThreshold per-file overrides in jest.config.js for critical files
Low
Low
📈 Metrics Summary
Metric
Value
Total workflows
44+ (14 static + 27 agentic + 3 infra)
Workflows triggered on PR
13 static + 7 agentic = 20 total
Recent run success rate (all events, last 30)
63% (19/30)
Core PR static workflow success rate
~100% (Lint, Build, Type Check, Coverage, Integration Tests all passing)
Agentic smoke test success rate
0% (all 4 smoke types currently failing)
Unit test coverage – statements
~38% (threshold: 38%)
Unit test coverage – branches
~32% (threshold: 30%)
cli.ts coverage
0%
docker-manager.ts coverage
18%
Integration test files
35 total, 7 (~20%) not mapped to any CI job
Container vuln scanning on PR
❌ None (only at release)
Dockerfile linting
❌ None
Shell script linting
❌ None
Assessment generated by automated CI/CD gap analysis on 2026-04-10. Workflow configurations analyzed from .github/workflows/. Data reflects the state of the main branch at the time of analysis.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a comprehensive CI/CD setup with 44+ workflows: 13 static GitHub Actions workflows and 27+ agentic workflows. Core static PR checks are all healthy and passing. However, several scheduled and agentic workflows are currently failing, and notable gaps exist in coverage completeness, security tooling, and test mapping.
Recent run health (last 30 completed runs across all events):
The failures are concentrated in agentic/scheduled workflows; static PR quality gates are passing.
✅ Existing Quality Gates
Static Workflows (run on every PR to
main)lint.ymlbuild.ymltest-integration.ymltsc --noEmit)codeql.ymldependency-audit.ymlnpm auditon main + docs-site; SARIF upload to Security tabpr-title.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlaction.ymlsetup action (latest, specific version, image pull, invalid)docs-preview.ymllink-check.yml.mdchanges only)Agentic Workflows (run on PRs)
build-test.mdsecurity-guard.mdsmoke-claude.mdsmoke-copilot.mdsmoke-codex.mdsmoke-chroot.mdsmoke-services.mdScheduled / Ongoing Security
🔍 Identified Gaps
🔴 High Priority
1. Seven Integration Test Files Not Mapped to Any CI Job
The following test files exist in
tests/integration/but are not matched by any pattern intest-integration-suite.ymlortest-chroot.yml:api-target-allowlist.test.tschroot-capsh-chain.test.tscli-proxy.test.tsgh-host-injection.test.tsghes-auto-populate.test.tshost-tcp-services.test.tsworkdir-tmpfs-hiding.test.tsOf these,
chroot-capsh-chain.test.ts,gh-host-injection.test.ts, andworkdir-tmpfs-hiding.test.tsare particularly concerning since they test security-critical isolation properties that the firewall depends on. A regression in these could be silently merged.2. Critically Low Unit Test Coverage Thresholds
Current Jest thresholds in
jest.config.js: statements 38%, branches 30%, functions 35%, lines 38%. The two most important source files have alarming coverage:cli.tsdocker-manager.tsThese thresholds allow a PR to pass coverage checks while introducing completely untested code paths in the most critical files. A PR adding 50 lines to
cli.tscould merge with 0% coverage and the gate won't catch it.3. All Agentic Smoke Tests Currently Failing
All AI-agent smoke tests (
smoke-claude,smoke-copilot,smoke-codex,smoke-services) are currently failing. While these may require real credentials to function, their sustained failure state means:security-guard.mdClaude review is also failing, leaving PRs without AI security review4. Dependency Vulnerability Audit Currently Failing
The
dependency-audit.ymlrun recently failed. Until this is resolved, PRs could be merging with known high/critical vulnerabilities.🟡 Medium Priority
5. No Dockerfile or Shell Script Linting
The project has multiple Dockerfiles (
containers/squid/,containers/agent/,containers/api-proxy/,containers/cli-proxy/) and numerous shell scripts (setup-iptables.sh,entrypoint.sh,cleanup.sh, etc.). None are linted in CI:hadolintfor Dockerfile best practices (e.g., pinning base image digests, layer hygiene, non-root users)shellcheckfor shell script correctness (e.g., unquoted variables,set -emissing, portability issues)Shell script bugs in
setup-iptables.shorentrypoint.shwould not be caught until integration tests fail.6. Performance Benchmarks Not PR-Gated
performance-monitor.ymlruns daily on a schedule only. A PR introducing a startup time regression (e.g., adding a slow synchronous operation tocli.ts) would not be caught until the next day's run, after the PR is merged.The benchmark infrastructure already exists with regression detection logic—it just needs to be triggered on PRs (or at least on
pushtomain).7. Custom ESLint Rules Not Tested in CI
package.jsondefines atest:lint-rulesscript (node eslint-rules/no-unsafe-execa.test.js) for testing custom ESLint rules. This script is not invoked from any CI workflow. A broken custom security rule (e.g., theno-unsafe-execarule that guards against shell injection) would not be detected until code review catches the test failure locally.8. No Container Image Vulnerability Scanning on PRs
SBOMs are generated at release time (using
anchore/sbom-action), but there is no pre-merge container image vulnerability scan. A PR that upgrades a base image to one with known CVEs (e.g., updatingubuntu/squid:latest) would pass all CI checks and merge. Tools like Grype or Trivy can scan built container images and fail the CI if critical CVEs are introduced.9. No External Coverage Reporting Integration
The LCOV report is generated during
test-coverage.ymlbut not uploaded to any external service (Codecov, Coveralls). This means:10. Documentation Build Not a Blocking PR Check
docs-preview.ymlusescontinue-on-error: truefor the build step—meaning the workflow reports success even if the docs site fails to build. A PR breaking the docs site would pass CI and show the PR comment "Documentation build failed" but would not block the merge.🟢 Low Priority
11. No Mutation Testing
The unit test suite passes but doesn't validate test quality. Mutation testing (e.g., Stryker Mutator) would detect if tests are actually verifying behavior or just achieving coverage by calling functions without assertions. With
docker-manager.tsat 18% coverage, this is less urgent now but will matter as coverage improves.12. Integration Test Containers Not Cached
Every integration test job runs
docker buildfrom scratch (no build cache, no layer cache). This adds 2–5 minutes per job and makes the parallel integration test suite slower than necessary. Using Docker BuildKit cache or pre-built images as a base would speed up CI feedback.13.
test:lint-rulesandtest:allScripts Exist but Not Used in CIpackage.jsonhas bothtest:lint-rulesandtest:all(test:unit && test:integration) scripts that aren't referenced from any GitHub Actions workflow. Thetest:allscript provides a unified entry point that would be useful for a pre-commit hook or a single combined CI job.14. No Enforced Minimum Coverage Per File
While global coverage thresholds exist, there are no per-file minimums. A single file could drop to 0% coverage without triggering the global threshold (as demonstrated by
cli.tsbeing at 0% while the suite still passes).📋 Actionable Recommendations
High Priority
test-integration-suite.ymlfor the 7 uncovered test files, grouped logically (e.g.,test-security-isolationfor capsh-chain, gh-host-injection, workdir-tmpfs-hiding)jest.config.jsforcli.tsanddocker-manager.tsMedium Priority
hadolintstep tobuild.ymlscanning allcontainers/*/Dockerfileshellcheckstep tobuild.ymlforcontainers/**/*.shandscripts/**/*.shbuild.ymltriggered onpushtomainnpm run test:lint-rulesstep tolint.ymlcontainer-security-scan.ymlusinganchore/scan-actionoraquasecurity/trivy-actionon PRs that touchcontainers/**codecov/codecov-actiontotest-coverage.ymlwith LCOV uploadcontinue-on-error: truefrom docs build step indocs-preview.ymlLow Priority
--cache-from/ Docker BuildKit layer caching in integration test jobscoverageThresholdper-file overrides injest.config.jsfor critical files📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageAssessment generated by automated CI/CD gap analysis on 2026-04-10. Workflow configurations analyzed from
.github/workflows/. Data reflects the state of themainbranch at the time of analysis.Beta Was this translation helpful? Give feedback.
All reactions