[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1880

2026-04-10T12:57:56Z

github-actions[bot]
bot Apr 10, 2026

📊 Current CI/CD Pipeline Status

The repository has a comprehensive CI/CD setup with 44+ workflows: 13 static GitHub Actions workflows and 27+ agentic workflows. Core static PR checks are all healthy and passing. However, several scheduled and agentic workflows are currently failing, and notable gaps exist in coverage completeness, security tooling, and test mapping.

Recent run health (last 30 completed runs across all events):

✅ Success: 19 runs (~63%)
❌ Failure: 11 runs (~37%)

The failures are concentrated in agentic/scheduled workflows; static PR quality gates are passing.

✅ Existing Quality Gates

Static Workflows (run on every PR to `main`)

Workflow	Purpose	Scope
`lint.yml`	ESLint + MarkdownLint	Source code + docs
`build.yml`	TypeScript build + api-proxy/cli-proxy unit tests	Node 20 & 22 matrix
`test-integration.yml`	TypeScript type check (`tsc --noEmit`)	Node 22
`codeql.yml`	SAST scanning (javascript-typescript + actions)	Security
`dependency-audit.yml`	`npm audit` on main + docs-site; SARIF upload to Security tab	Supply chain
`pr-title.yml`	Semantic PR title validation (Conventional Commits)	Process
`test-coverage.yml`	Jest unit tests with coverage diff vs. base branch + PR comment	Quality
`test-integration-suite.yml`	5 parallel integration test jobs (domain, network, protocol, container ops, API proxy)	Functional
`test-chroot.yml`	4 parallel chroot integration test jobs (languages, pkgs, procfs, edge cases)	Functional
`test-examples.yml`	Runs all example shell scripts end-to-end	Functional
`test-action.yml`	Tests `action.yml` setup action (latest, specific version, image pull, invalid)	Functional
`docs-preview.yml`	Builds docs site and posts artifact link comment	Documentation
`link-check.yml`	Validates all Markdown links (triggered on `.md` changes only)	Documentation

Agentic Workflows (run on PRs)

Workflow	Engine	Purpose
`build-test.md`	Copilot	Multi-runtime build tests (Node, Go, Rust, Java, .NET)
`security-guard.md`	Claude	Security posture review on PR diffs
`smoke-claude.md`	Claude	End-to-end firewall test with real Claude agent
`smoke-copilot.md`	Copilot	End-to-end firewall test with real Copilot agent
`smoke-codex.md`	Codex	End-to-end firewall test with real Codex agent
`smoke-chroot.md`	Copilot	Chroot-specific smoke test (path-filtered)
`smoke-services.md`	Copilot	Services/sidecar smoke test

Scheduled / Ongoing Security

Daily: Dependency Security Monitor, Security Review (Claude), Token Usage analyzers, Performance Benchmarks
Weekly: Dependency Vulnerability Audit, CodeQL, CLI Flag Consistency Checker, Test Coverage Improver
Release: SBOM generation (SPDX) for all container images, signed SLSA provenance

🔍 Identified Gaps

🔴 High Priority

1. Seven Integration Test Files Not Mapped to Any CI Job

The following test files exist in tests/integration/ but are not matched by any pattern in test-integration-suite.yml or test-chroot.yml:

Test File	Description
`api-target-allowlist.test.ts`	API target domain allowlist enforcement
`chroot-capsh-chain.test.ts`	Capability chain in chroot (security-critical)
`cli-proxy.test.ts`	DIFC CLI proxy integration
`gh-host-injection.test.ts`	GitHub host injection attack resistance
`ghes-auto-populate.test.ts`	GitHub Enterprise Server auto-population
`host-tcp-services.test.ts`	Host TCP service access control
`workdir-tmpfs-hiding.test.ts`	Workdir tmpfs visibility from agent

Of these, chroot-capsh-chain.test.ts, gh-host-injection.test.ts, and workdir-tmpfs-hiding.test.ts are particularly concerning since they test security-critical isolation properties that the firewall depends on. A regression in these could be silently merged.

2. Critically Low Unit Test Coverage Thresholds

Current Jest thresholds in jest.config.js: statements 38%, branches 30%, functions 35%, lines 38%. The two most important source files have alarming coverage:

File	Statements	Functions	Lines	Risk
`cli.ts`	0% (0/69)	0% (0/10)	0% (0/69)	Entry point untested
`docker-manager.ts`	18% (45/250)	4% (1/25)	17% (41/239)	Core orchestration untested

These thresholds allow a PR to pass coverage checks while introducing completely untested code paths in the most critical files. A PR adding 50 lines to cli.ts could merge with 0% coverage and the gate won't catch it.

3. All Agentic Smoke Tests Currently Failing

All AI-agent smoke tests (smoke-claude, smoke-copilot, smoke-codex, smoke-services) are currently failing. While these may require real credentials to function, their sustained failure state means:

No end-to-end validation that the firewall correctly sandboxes real AI agents
The security-guard.md Claude review is also failing, leaving PRs without AI security review

4. Dependency Vulnerability Audit Currently Failing

The dependency-audit.yml run recently failed. Until this is resolved, PRs could be merging with known high/critical vulnerabilities.

🟡 Medium Priority

5. No Dockerfile or Shell Script Linting

The project has multiple Dockerfiles (containers/squid/, containers/agent/, containers/api-proxy/, containers/cli-proxy/) and numerous shell scripts (setup-iptables.sh, entrypoint.sh, cleanup.sh, etc.). None are linted in CI:

No hadolint for Dockerfile best practices (e.g., pinning base image digests, layer hygiene, non-root users)
No shellcheck for shell script correctness (e.g., unquoted variables, set -e missing, portability issues)

Shell script bugs in setup-iptables.sh or entrypoint.sh would not be caught until integration tests fail.

6. Performance Benchmarks Not PR-Gated

performance-monitor.yml runs daily on a schedule only. A PR introducing a startup time regression (e.g., adding a slow synchronous operation to cli.ts) would not be caught until the next day's run, after the PR is merged.

The benchmark infrastructure already exists with regression detection logic—it just needs to be triggered on PRs (or at least on push to main).

7. Custom ESLint Rules Not Tested in CI

package.json defines a test:lint-rules script (node eslint-rules/no-unsafe-execa.test.js) for testing custom ESLint rules. This script is not invoked from any CI workflow. A broken custom security rule (e.g., the no-unsafe-execa rule that guards against shell injection) would not be detected until code review catches the test failure locally.

8. No Container Image Vulnerability Scanning on PRs

SBOMs are generated at release time (using anchore/sbom-action), but there is no pre-merge container image vulnerability scan. A PR that upgrades a base image to one with known CVEs (e.g., updating ubuntu/squid:latest) would pass all CI checks and merge. Tools like Grype or Trivy can scan built container images and fail the CI if critical CVEs are introduced.

9. No External Coverage Reporting Integration

The LCOV report is generated during test-coverage.yml but not uploaded to any external service (Codecov, Coveralls). This means:

No coverage trend tracking across time
No coverage badge in README
No per-file coverage diff comments in PR UI (only the aggregated percentage is commented)

10. Documentation Build Not a Blocking PR Check

docs-preview.yml uses continue-on-error: true for the build step—meaning the workflow reports success even if the docs site fails to build. A PR breaking the docs site would pass CI and show the PR comment "Documentation build failed" but would not block the merge.

🟢 Low Priority

11. No Mutation Testing

The unit test suite passes but doesn't validate test quality. Mutation testing (e.g., Stryker Mutator) would detect if tests are actually verifying behavior or just achieving coverage by calling functions without assertions. With docker-manager.ts at 18% coverage, this is less urgent now but will matter as coverage improves.

12. Integration Test Containers Not Cached

Every integration test job runs docker build from scratch (no build cache, no layer cache). This adds 2–5 minutes per job and makes the parallel integration test suite slower than necessary. Using Docker BuildKit cache or pre-built images as a base would speed up CI feedback.

13. `test:lint-rules` and `test:all` Scripts Exist but Not Used in CI

package.json has both test:lint-rules and test:all (test:unit && test:integration) scripts that aren't referenced from any GitHub Actions workflow. The test:all script provides a unified entry point that would be useful for a pre-commit hook or a single combined CI job.

14. No Enforced Minimum Coverage Per File

While global coverage thresholds exist, there are no per-file minimums. A single file could drop to 0% coverage without triggering the global threshold (as demonstrated by cli.ts being at 0% while the suite still passes).

📋 Actionable Recommendations

High Priority

Gap	Recommendation	Complexity	Impact
7 uncovered integration tests	Add new CI jobs to `test-integration-suite.yml` for the 7 uncovered test files, grouped logically (e.g., `test-security-isolation` for capsh-chain, gh-host-injection, workdir-tmpfs-hiding)	Low	High
Low coverage thresholds	Raise thresholds to 60%/50%/60%/60% and add per-file minimums in `jest.config.js` for `cli.ts` and `docker-manager.ts`	Low	High
Failing smoke/security-guard	Investigate root cause; at minimum add health monitoring to create issues when these fail for more than N consecutive runs	Medium	High
Failing dependency audit	Fix the dependency audit failure (likely a high/critical vuln that needs patching or exception)	Low	High

Medium Priority

Gap	Recommendation	Complexity	Impact
No Dockerfile linting	Add `hadolint` step to `build.yml` scanning all `containers/*/Dockerfile`	Low	Medium
No shell script linting	Add `shellcheck` step to `build.yml` for `containers/*/.sh` and `scripts/*/.sh`	Low	Medium
Performance not PR-gated	Add a lightweight benchmark job (fewer iterations, e.g., 5 vs. 30) to `build.yml` triggered on `push` to `main`	Medium	Medium
Custom ESLint rules untested	Add `npm run test:lint-rules` step to `lint.yml`	Low	Medium
No container vuln scanning	Add a `container-security-scan.yml` using `anchore/scan-action` or `aquasecurity/trivy-action` on PRs that touch `containers/**`	Medium	Medium
Coverage not uploaded externally	Add `codecov/codecov-action` to `test-coverage.yml` with LCOV upload	Low	Medium
Docs build not blocking	Remove `continue-on-error: true` from docs build step in `docs-preview.yml`	Low	Medium

Low Priority

Gap	Recommendation	Complexity	Impact
No mutation testing	Introduce Stryker Mutator in a weekly scheduled workflow	High	Low
Container build caching	Add `--cache-from` / Docker BuildKit layer caching in integration test jobs	Medium	Low
Per-file coverage minimums	Add `coverageThreshold` per-file overrides in `jest.config.js` for critical files	Low	Low

📈 Metrics Summary

Metric	Value
Total workflows	44+ (14 static + 27 agentic + 3 infra)
Workflows triggered on PR	13 static + 7 agentic = 20 total
Recent run success rate (all events, last 30)	63% (19/30)
Core PR static workflow success rate	~100% (Lint, Build, Type Check, Coverage, Integration Tests all passing)
Agentic smoke test success rate	0% (all 4 smoke types currently failing)
Unit test coverage – statements	~38% (threshold: 38%)
Unit test coverage – branches	~32% (threshold: 30%)
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Integration test files	35 total, 7 (~20%) not mapped to any CI job
Container vuln scanning on PR	❌ None (only at release)
Dockerfile linting	❌ None
Shell script linting	❌ None

Assessment generated by automated CI/CD gap analysis on 2026-04-10. Workflow configurations analyzed from .github/workflows/. Data reflects the state of the main branch at the time of analysis.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 1.2M · ◷

expires on Apr 17, 2026, 12:57 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1880

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1880

Uh oh!

github-actions[bot] bot Apr 10, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Static Workflows (run on every PR to main)

Agentic Workflows (run on PRs)

Scheduled / Ongoing Security

🔍 Identified Gaps

🔴 High Priority

1. Seven Integration Test Files Not Mapped to Any CI Job

2. Critically Low Unit Test Coverage Thresholds

3. All Agentic Smoke Tests Currently Failing

4. Dependency Vulnerability Audit Currently Failing

🟡 Medium Priority

5. No Dockerfile or Shell Script Linting

6. Performance Benchmarks Not PR-Gated

7. Custom ESLint Rules Not Tested in CI

8. No Container Image Vulnerability Scanning on PRs

9. No External Coverage Reporting Integration

10. Documentation Build Not a Blocking PR Check

🟢 Low Priority

11. No Mutation Testing

12. Integration Test Containers Not Cached

13. test:lint-rules and test:all Scripts Exist but Not Used in CI

14. No Enforced Minimum Coverage Per File

📋 Actionable Recommendations

High Priority

Medium Priority

Low Priority

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Apr 10, 2026

Static Workflows (run on every PR to `main`)

13. `test:lint-rules` and `test:all` Scripts Exist but Not Used in CI