Skip to content

Commit ef82324

Browse files
authored
Merge pull request #237 from sanmak/feat/add-dep-gate-eval-hardening-plan-blocking-docs-refresh
feat: add dependency introduction gate, evaluation bias hardening, plan-mode blocking, and docs refresh
2 parents 7adb4e3 + 1b08a5d commit ef82324

81 files changed

Lines changed: 6107 additions & 575 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/docs-sync.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ For each changed file, check which documentation may be affected using this mapp
4848
| `core/custom-templates.md` | `docs/REFERENCE.md` (Spec Templates) |
4949
| `core/data-handling.md` | `CLAUDE.md` (Security-Sensitive Files) |
5050
| `core/dependency-safety.md` | `CLAUDE.md` (Security-Sensitive Files, core modules list), `docs/REFERENCE.md` (Configuration Options), `docs/STRUCTURE.md` |
51+
| `core/dependency-introduction.md` | `CLAUDE.md` (core modules list), `docs/STRUCTURE.md` |
5152
| `core/evaluation.md` | `CLAUDE.md` (core modules list), `docs/REFERENCE.md` (Configuration Options), `docs/STRUCTURE.md` |
5253
| `core/learnings.md` | `CLAUDE.md` (core modules list), `docs/REFERENCE.md` (Configuration Options), `docs/STRUCTURE.md`, `docs/COMMANDS.md` (Learn) |
5354
| `core/error-handling.md` | `docs/REFERENCE.md` |

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,7 @@ dist/
4343
!.claude/commands/
4444
!.claude/skills/
4545

46-
/internal*
46+
/internal*
47+
48+
# SpecOps ephemeral marker (plan-pending-conversion state machine)
49+
.plan-pending-conversion

.specops/adversarial-evaluation/spec.json

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"specopsCreatedWith": "1.7.0",
99
"specopsUpdatedWith": "1.7.0",
1010
"author": {
11-
"name": "sanketmakhija"
11+
"name": "sanmak"
1212
},
1313
"reviewers": [],
1414
"reviewRounds": 0,
@@ -24,5 +24,8 @@
2424
"specDurationMinutes": 30
2525
},
2626
"specDependencies": [],
27-
"relatedSpecs": ["workflow-enforcement-gates", "context-aware-dispatch"]
27+
"relatedSpecs": [
28+
"workflow-enforcement-gates",
29+
"context-aware-dispatch"
30+
]
2831
}
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# Design: Dependency Introduction Gate
2+
3+
## Architecture Overview
4+
5+
The Dependency Introduction Gate adds a new core module (`core/dependency-introduction.md`) that governs *which* dependencies enter a project, complementing the existing `core/dependency-safety.md` which audits *whether* existing dependencies are safe. The gate operates at three workflow touchpoints: Phase 2 (decision-making), Phase 3 (enforcement), and audit mode (drift detection). It also extends adversarial evaluation scoring and the dependencies.md steering file.
6+
7+
## Technical Decisions
8+
9+
### Decision 1: Always-Active Gate with No Config Knobs
10+
11+
**Context:** The existing dependency-safety gate has `config.dependencySafety.enabled`, `severityThreshold`, and other config options. Should the introduction gate follow the same pattern?
12+
**Options Considered:**
13+
14+
1. Add `config.dependencyIntroduction` section with enabled/threshold/bypass options -- Pros: consistency with dependency-safety; Cons: config knobs become dead features when agents skip non-mandatory steps
15+
2. Always active, no bypass, deterministic -- Pros: no way to circumvent, simpler implementation, aligns with the "deterministic workflow execution" feedback pattern; Cons: less flexible
16+
17+
**Decision:** Option 2 -- always active, no config knobs
18+
**Rationale:** The feedback memory documents that config flags without deterministic workflow instructions are dead features. The gate should always run. If a spec has no new dependencies, it passes trivially. There is no legitimate reason to bypass dependency governance.
19+
20+
### Decision 2: Separate Core Module vs. Extending dependency-safety.md
21+
22+
**Context:** Should the introduction gate be added to the existing dependency-safety.md or be a new module?
23+
**Options Considered:**
24+
25+
1. Extend `core/dependency-safety.md` with introduction gate sections -- Pros: single dependency module; Cons: module becomes very large, mixes two concerns (auditing existing vs. governing new)
26+
2. New `core/dependency-introduction.md` module -- Pros: separation of concerns, each module has a clear purpose; Cons: one more module in the system
27+
28+
**Decision:** Option 2 -- new `core/dependency-introduction.md`
29+
**Rationale:** The two modules have different trigger points (safety gate runs once at Phase 2 step 6.5; introduction gate runs at step 5.8 and throughout Phase 3), different data flows, and different purposes. Combining them would create an oversized module that is harder to maintain.
30+
31+
### Decision 3: Phase 2 Step Numbering
32+
33+
**Context:** Where in Phase 2 does the dependency introduction gate run?
34+
**Options Considered:**
35+
36+
1. New step 5.8 -- between coherence verification (5.5) and vocabulary verification (5.6) -- Pros: early enough to inform design; Cons: requires renumbering
37+
2. New step 5.8 -- after code-grounded plan validation (5.7) -- Pros: has access to validated file paths; Cons: late in Phase 2
38+
3. New step 6.3 -- after external issue creation (6) -- Pros: no renumbering; Cons: too late, dependencies should be decided before issues are created
39+
40+
**Decision:** Option 2 -- step 5.8, after code-grounded plan validation
41+
**Rationale:** At step 5.8 the design.md is finalized and file paths are validated. This is the right moment to scan for dependency references and evaluate them before issue creation and the dependency safety gate (6.5). The numbering follows the existing fractional step pattern (5.5, 5.6, 5.7, 5.8).
42+
43+
### Decision 4: Install Command Pattern Detection
44+
45+
**Context:** How does the gate detect when Phase 3 attempts to install a dependency?
46+
**Options Considered:**
47+
48+
1. Pre-scan task implementation steps for install commands before executing -- Pros: catches before execution; Cons: tasks may not list exact commands
49+
2. Hook into RUN_COMMAND calls and intercept install patterns -- Pros: catches all installs; Cons: requires changes to the abstract operation layer
50+
3. Pattern-match in the core module instructions -- the agent is instructed to check before running any install command -- Pros: works within existing architecture; Cons: relies on agent compliance
51+
52+
**Decision:** Option 3 -- instruct the agent to verify before executing install commands
53+
**Rationale:** SpecOps operates at the instruction/prompt level, not at a code execution interception layer. The agent is told "WHEN you are about to run an install command, verify the package is in ### Dependency Decisions first." This is the same enforcement pattern used by all other SpecOps gates (task state machine, dependency gate, etc.). Adversarial evaluation catches violations that slip through.
54+
55+
### Decision 5: Maintenance Profile Intelligence Layers
56+
57+
**Context:** How does the gate assess whether a dependency is well-maintained?
58+
**Options Considered:**
59+
60+
1. Registry APIs only (npmjs.com, pypi.org) -- Pros: standardized, fast; Cons: limited data
61+
2. 3-layer approach: registry API, source repo (GitHub API), LLM fallback -- Pros: comprehensive; Cons: more network calls
62+
3. LLM-only assessment -- Pros: no network dependency; Cons: training data is stale
63+
64+
**Decision:** Option 2 -- 3-layer approach matching the dependency-safety.md pattern
65+
**Rationale:** Consistent with the existing 3-layer verification pattern in dependency-safety.md. Each layer compensates for the previous layer's failures: registry APIs provide download stats and last-publish dates; GitHub API provides star count, last commit, and open issue count; LLM fallback provides knowledge when APIs are unavailable.
66+
67+
## Component Design
68+
69+
### Component 1: Install Command Patterns Table
70+
71+
**Responsibility:** Define ecosystem-specific install command patterns that the gate recognizes
72+
**Interface:** Markdown table mapping ecosystems to command patterns (npm install, pip install, cargo add, etc.)
73+
**Dependencies:** Ecosystem list from `core/dependency-safety.md` Dependency Detection Protocol
74+
75+
### Component 2: Build-vs-Install Evaluation Framework
76+
77+
**Responsibility:** Provide 5 criteria for evaluating whether to install a package or build the functionality in-house
78+
**Interface:** 5-criteria scoring table: scope match, maintenance health, size proportionality, security surface, license compatibility
79+
**Dependencies:** Maintenance Profile Intelligence (Component 3)
80+
81+
### Component 3: Maintenance Profile Intelligence
82+
83+
**Responsibility:** Assess dependency maintenance health using a 3-layer approach
84+
**Interface:** 3 layers: (1) registry APIs for download stats and publish dates, (2) source repo APIs for activity metrics, (3) LLM fallback for training data knowledge
85+
**Dependencies:** Network access for layers 1-2; graceful fallback to layer 3
86+
87+
### Component 4: Phase 2 Step 5.8 Gate Procedure
88+
89+
**Responsibility:** Scan design.md for dependency references, evaluate new dependencies, surface to user, record decisions
90+
**Interface:** Integrated into Phase 2 workflow after step 5.7
91+
**Dependencies:** Components 1-3, dependencies.md steering file
92+
93+
### Component 5: Phase 3 Spec Adherence Enforcement
94+
95+
**Responsibility:** Verify all install commands target approved dependencies; flag unapproved installs as protocol breach
96+
**Interface:** Enforcement rule in Phase 3 implementation gates section
97+
**Dependencies:** design.md ### Dependency Decisions section
98+
99+
### Component 6: Auto-Intelligence Policy Generation
100+
101+
**Responsibility:** Create and maintain ## Dependency Introduction Policy in dependencies.md
102+
**Interface:** Writes to dependencies.md steering file, preserving team-maintained sections
103+
**Dependencies:** Component 4 decisions, vertical detection
104+
105+
### Component 7: Adversarial Evaluation Updates
106+
107+
**Responsibility:** Add dependency-aware scoring guidance to Design Coherence and Design Fidelity dimensions
108+
**Interface:** Additional guidance text in existing evaluation dimension tables
109+
**Dependencies:** `core/evaluation.md` existing scoring tables
110+
111+
### Component 8: Audit Mode Dependency Drift Check
112+
113+
**Responsibility:** 7th drift check comparing installed packages against approved dependencies
114+
**Interface:** New check in the Six Drift Checks section of `core/reconciliation.md`
115+
**Dependencies:** Ecosystem detection from `core/dependency-safety.md`, lock file parsing
116+
117+
### Component 9: Generator Pipeline Integration
118+
119+
**Responsibility:** Wire the new module into build_common_context(), Jinja2 templates, mode-manifest.json, validate.py, and test_platform_consistency.py
120+
**Interface:** Standard generator pipeline integration pattern
121+
**Dependencies:** Existing generator infrastructure
122+
123+
### Dependency Decisions
124+
125+
No external dependencies required. This feature is pure markdown/workflow logic with optional network calls to public registry APIs (npmjs.com, pypi.org, registry.npmjs.org, api.github.com) that use graceful fallback when unavailable.
126+
127+
## Sequence Diagrams
128+
129+
### Flow 1: Phase 2 Dependency Introduction Gate (Step 5.8)
130+
131+
```text
132+
Agent -> design.md: READ_FILE to scan for install commands and package references
133+
Agent -> dependencies.md: READ_FILE to get Detected Dependencies
134+
Agent -> Agent: Compare design.md packages against Detected Dependencies
135+
Agent -> Agent: Identify net-new dependencies
136+
[For each new dependency:]
137+
Agent -> Registry API: Query download stats, last publish (Layer 1)
138+
Agent -> GitHub API: Query stars, last commit, issues (Layer 2)
139+
Agent -> Agent: LLM fallback if APIs fail (Layer 3)
140+
Agent -> Agent: Run Build-vs-Install 5-criteria evaluation
141+
Agent -> User: ASK_USER with evaluation summary and recommendation
142+
User -> Agent: Approve or reject
143+
Agent -> design.md: EDIT_FILE to add ### Dependency Decisions with outcomes
144+
Agent -> dependencies.md: EDIT_FILE to update ## Dependency Introduction Policy
145+
```
146+
147+
### Flow 2: Phase 3 Install Command Enforcement
148+
149+
```text
150+
Agent -> tasks.md: READ_FILE to get implementation steps
151+
[For each task with install commands:]
152+
Agent -> design.md: READ_FILE ### Dependency Decisions
153+
Agent -> Agent: Verify target package is in approved list
154+
[If approved:]
155+
Agent -> Shell: RUN_COMMAND install command
156+
[If NOT approved:]
157+
Agent -> User: NOTIFY_USER protocol breach -- unapproved dependency
158+
Agent -> Agent: HALT until user approves or removes the install
159+
```
160+
161+
### Flow 3: Audit Mode Dependency Drift
162+
163+
```text
164+
Agent -> Lock files: READ_FILE to get installed packages
165+
Agent -> index.json: READ_FILE to enumerate completed specs
166+
[For each completed spec:]
167+
Agent -> design.md: READ_FILE ### Dependency Decisions to collect approved packages
168+
Agent -> Agent: Compare installed vs approved union
169+
Agent -> Agent: Flag unapproved packages as Warning
170+
Agent -> Report: Include in Dependency Drift check
171+
```
172+
173+
## Data Model Changes
174+
175+
No database or data model changes. The feature uses existing spec artifact files (design.md, dependencies.md, spec.json) and adds new sections within them.
176+
177+
### New Sections in Existing Files
178+
179+
**design.md** -- new section added during Phase 2 step 5.8:
180+
181+
```markdown
182+
### Dependency Decisions
183+
184+
| Package | Version | Ecosystem | Decision | Rationale |
185+
| ------- | ------- | --------- | -------- | --------- |
186+
| express | ^4.18 | Node.js | Approved | Scope match: HTTP server framework needed; Health: 59M weekly downloads, active maintenance |
187+
| lodash | ^4.17 | Node.js | Rejected | Size proportionality: only need _.get -- use optional chaining instead |
188+
```
189+
190+
**dependencies.md** steering file -- new section:
191+
192+
```markdown
193+
## Dependency Introduction Policy
194+
195+
**Default stance:** conservative (builder vertical)
196+
**Ecosystem:** Node.js (detected from package-lock.json)
197+
198+
### Approved Patterns
199+
- [accumulated from dependency decisions across specs]
200+
201+
### Rejected Patterns
202+
- [accumulated from rejected dependencies with reasons]
203+
```
204+
205+
## API Changes
206+
207+
No API changes. This is a workflow/prompt-level feature.
208+
209+
## Security Considerations
210+
211+
- Authentication: N/A -- registry API calls are public, no auth needed
212+
- Authorization: N/A -- no new permissions required
213+
- Data protection: Registry API responses are ephemeral, not stored beyond the evaluation
214+
- Input validation: Package names from design.md are treated as data, not commands -- no shell injection risk because they are used for string comparison, not interpolated into commands
215+
216+
## Performance Considerations
217+
218+
- Registry API calls use 10-second timeout (matching dependency-safety.md pattern)
219+
- Maximum 10 dependencies evaluated per gate run (matching dependency-safety.md top-10 limit)
220+
- Layer fallback ensures the gate completes even without network access
221+
222+
## Testing Strategy
223+
224+
- Unit tests: Validation markers in test_platform_consistency.py (DEPENDENCY_INTRODUCTION_MARKERS)
225+
- Integration tests: `python3 generator/validate.py` verifies markers present in all 5 platform outputs
226+
- E2E tests: `bash scripts/run-tests.sh` runs full test suite
227+
228+
## Rollout Plan
229+
230+
1. Create `core/dependency-introduction.md` with all gate logic
231+
2. Update `core/workflow.md` with Phase 2 step 5.8 and Phase 3 enforcement
232+
3. Update `core/evaluation.md` with scoring guidance
233+
4. Update `core/reconciliation.md` with 7th drift check
234+
5. Update `core/steering.md` with Dependency Introduction Policy section in dependencies.md template
235+
6. Update generator pipeline (generate.py, templates, mode-manifest.json, validate.py, test)
236+
7. Regenerate all platform outputs
237+
8. Run full test suite
238+
239+
## Risks & Mitigations
240+
241+
- **Risk 1:** Registry API rate limiting or downtime could slow Phase 2 -- **Mitigation:** 3-layer fallback ensures the gate always completes; 10-second timeouts prevent hanging
242+
- **Risk 2:** Agents may not reliably self-enforce install command checking in Phase 3 -- **Mitigation:** Adversarial evaluation catches violations in Phase 4A; audit mode catches drift post-completion
243+
- **Risk 3:** Lock file parsing may miss edge cases (workspaces, monorepos) -- **Mitigation:** Audit drift check flags as Warning (not Drift), acknowledging pre-existing dependencies
244+
245+
## Dependencies & Blockers
246+
247+
### Spec Dependencies
248+
249+
| Dependent Spec | Reason | Required | Status |
250+
| -------------- | ------ | -------- | ------ |
251+
| dependency-safety-gate | Shares ecosystem detection and dependencies.md | No | Completed |
252+
| adversarial-evaluation | Scoring dimensions to extend | No | Completed |
253+
254+
### Cross-Spec Blockers
255+
256+
<!-- Resolution types: scope_cut, interface_defined, completed, escalated, deferred -->
257+
258+
| Blocker | Blocking Spec | Resolution Type | Resolution Detail | Status |
259+
| ------- | ------------- | --------------- | ----------------- | ------ |
260+
| -- | -- | -- | -- | -- |
261+
262+
## Future Enhancements
263+
264+
- Dependency version range policy enforcement (e.g., "must pin major versions")
265+
- Monorepo per-workspace dependency scoping
266+
- Dependency graph visualization in audit reports
267+
- Integration with Dependabot/Renovate for automated upgrade tracking

0 commit comments

Comments
 (0)