Skip to content

Commit 83e0d3a

Browse files
EricCogenCopilot
andcommitted
docs: Add Phase 24.0 metrics analysis guide and decision gate procedures
Phase 24.0: Production Metrics Analysis (Week 1 post-Phase 23) Purpose: - Validate Phase 23 FP reduction targets (17-28%) - Confirm cumulative reduction (39-60%) - Make GO/NO-GO decision for Phase 24.1 Metrics to collect: 1. Daily FP counts (P4, P5, P6 separately) 2. Coordination activation frequency (per 1,000 findings) 3. Confidence score distribution (pre/post boost) 4. False negative rate change (target: <0.5% increase) Decision Gate 1 (Day 7): - GO criteria: All metrics within targets - NO-GO criteria: FP reduction <12%, FN rate >1%, or stability issues - If GO: Proceed to Phase 24.1 (P7 implementation) - If NO-GO: Tune thresholds and re-evaluate Includes: - Daily metrics collection templates - Weekly checkpoint procedures - Tuning procedures if NO-GO - Rollback decision criteria - GCI0038 baseline validation (P7 prerequisite) Reference Documents: - DEPLOYMENT_CHECKLIST_v2.7.0.md - coordination-runbook.md - phase-24-plan.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent c6f2a2f commit 83e0d3a

2 files changed

Lines changed: 363 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ graphify-out/
105105
!docs/architecture/coordination-platform-reference.md
106106
!docs/phase-23-plan.md
107107
!docs/phase-24-plan.md
108+
!docs/phase-24-0-metrics-analysis.md
108109
!docs/operations/phase-21-monitoring.md
109110
!docs/operations/phase-21-metrics.md
110111
!docs/operations/coordination-runbook.md
Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# Phase 24.0: Production Metrics Analysis & Decision Gate
2+
3+
**Timeline:** Week 1 post-Phase 23 deployment (7-14 days)
4+
**Goal:** Validate Phase 23 FP reduction targets before proceeding to Phase 24.1
5+
**Decision Gate:** Week 1 checkpoint - GO/NO-GO to Phase 24.1
6+
7+
---
8+
9+
## Overview
10+
11+
Phase 24.0 is a passive metrics collection and analysis phase. No code changes occur. Instead:
12+
13+
1. Deploy Phase 23 to production
14+
2. Collect production metrics for 1-2 weeks
15+
3. Analyze Phase 23 impact vs targets
16+
4. Make GO/NO-GO decision at Gate 1
17+
18+
**Success Criteria:**
19+
- Phase 23 FP reduction: 17-28% (vs Phase 21 baseline)
20+
- Cumulative reduction stable: 39-60%
21+
- Coordination activation frequency: Within expected ranges
22+
- False negative rate: <5% increase
23+
24+
---
25+
26+
## Metrics to Collect
27+
28+
### 1. False Positive Counts (Daily)
29+
30+
Track false positive findings by category:
31+
32+
```
33+
Date P4_FP P5_FP P6_FP Total_Phase23_FP Cumulative_FP Reduction_%
34+
2026-05-06 12 18 22 52 150 45.3%
35+
2026-05-07 11 17 23 51 148 44.8%
36+
2026-05-08 13 19 21 53 152 46.2%
37+
...
38+
7-day avg: 12.3 18.1 22.0 52.4 150.2 45.5%
39+
```
40+
41+
**Calculations:**
42+
- Baseline (Phase 21): 100 FP per day (typical corpus)
43+
- Target (Phase 23): 100 × (1 - 0.225) = 77.5 FP per day (22.5% reduction)
44+
- Expected range: 72-83 FP per day (17-28% reduction)
45+
46+
**Data Collection:**
47+
```bash
48+
# Extract from logs/metrics.csv (if available)
49+
grep "false_positive_count" logs/metrics.log | \
50+
grep "2026-05-" | \
51+
awk '{print $1, $3}' > phase23_fp_by_day.csv
52+
53+
# Manual tracking if logs unavailable
54+
# Use dashboard or direct API query
55+
```
56+
57+
### 2. Coordination Activation Frequency (Daily)
58+
59+
Track how often each coordination fires:
60+
61+
```
62+
Date P4_Activations P5_Activations P6_Activations Notes
63+
2026-05-06 22 31 38 Normal operation
64+
2026-05-07 19 35 40 P5 spike
65+
2026-05-08 25 28 36 Stable
66+
...
67+
7-day avg: 22.0 31.5 37.8 Within range
68+
```
69+
70+
**Expected Ranges (per 1,000 findings):**
71+
- P4 (Performance): 15-30 activations (we expect ~22)
72+
- P5 (Serialization): 20-40 activations (we expect ~31)
73+
- P6 (DI & Async): 25-50 activations (we expect ~38)
74+
75+
**Data Collection:**
76+
```bash
77+
# Count coordination tags in logs
78+
for day in {6..12}; do
79+
grep "coordination:P4-performance" logs/labeling-2026-05-${day}.log | wc -l
80+
grep "coordination:P5-serialization" logs/labeling-2026-05-${day}.log | wc -l
81+
grep "coordination:P6-di-async" logs/labeling-2026-05-${day}.log | wc -l
82+
done
83+
```
84+
85+
### 3. Confidence Score Distribution
86+
87+
Analyze confidence scores before/after coordination:
88+
89+
```
90+
Rule Before_Avg After_Avg Boost_Delta Distribution
91+
GCI0044 0.60 0.75 +0.15 [0.70-0.80: 85%, >0.80: 15%]
92+
GCI0035 0.55 0.70 +0.15 [0.65-0.75: 80%, >0.75: 20%]
93+
GCI0039 0.55 0.85 +0.30 [0.80-0.90: 95%, >0.90: 5%]
94+
GCI0048 0.60 0.80 +0.20 [0.75-0.85: 92%, >0.85: 8%]
95+
GCI0045 0.55 0.75 +0.20 [0.70-0.80: 88%, >0.80: 12%]
96+
GCI0016 0.65 0.80 +0.15 [0.75-0.85: 90%, >0.85: 10%]
97+
```
98+
99+
**Analysis Goals:**
100+
- Verify boost delta matches documented values (±5%)
101+
- Check for unexpected clustering (indicates potential issues)
102+
- Ensure no confidence > 0.95 (risk of over-confidence)
103+
104+
**Data Collection:**
105+
```bash
106+
# Extract confidence scores from labeling output
107+
grep "ExpectedConfidence" logs/labeling.log | \
108+
awk '{print $2, $3}' > confidence_distribution.csv
109+
```
110+
111+
### 4. False Negative Rate
112+
113+
Monitor for missed findings (false negatives):
114+
115+
```
116+
Rule Phase21_FN_Rate Phase23_FN_Rate Delta Status
117+
GCI0044 2.1% 2.3% +0.2% ✓ OK
118+
GCI0035 1.8% 1.9% +0.1% ✓ OK
119+
GCI0039 1.5% 2.2% +0.7% ⚠ Monitor
120+
GCI0048 1.3% 1.8% +0.5% ⚠ Watch
121+
GCI0045 2.4% 2.6% +0.2% ✓ OK
122+
GCI0016 1.1% 1.4% +0.3% ✓ OK
123+
```
124+
125+
**Target:** All deltas < 0.5% (increase by no more than 0.5 percentage points)
126+
**Alert:** If any delta > 1%, investigate coordination thresholds
127+
128+
**Data Collection:**
129+
```bash
130+
# Compare finding counts against known-good corpus
131+
# If known truth available: measure recall = TP / (TP + FN)
132+
# Calculate FN rate = FN / (TP + FN)
133+
```
134+
135+
---
136+
137+
## Daily Metrics Report Template
138+
139+
Create daily report template (CSV):
140+
141+
```
142+
date,p4_fp,p5_fp,p6_fp,total_phase23_fp,baseline_fp,reduction_pct,p4_activations,p5_activations,p6_activations,p4_avg_conf,p5_avg_conf,p6_avg_conf,notes
143+
```
144+
145+
**Example Day 1:**
146+
```
147+
2026-05-06,12,18,22,52,100,48%,22,31,38,0.75,0.85,0.80,Initial deployment - all services stable
148+
```
149+
150+
---
151+
152+
## Weekly Analysis Checkpoints
153+
154+
### End of Day 3 (Mid-Week) - Preliminary Check
155+
156+
**Question:** Are we on track?
157+
158+
```
159+
Metric Target Actual Status
160+
FP Reduction (3-day avg) 17-28% from BL 16-24% (example) ✓ Tracking
161+
P4 Activation Freq 15-30 per 1k 18-25 (example) ✓ OK
162+
P5 Activation Freq 20-40 per 1k 28-35 (example) ✓ OK
163+
P6 Activation Freq 25-50 per 1k 35-42 (example) ✓ OK
164+
Confidence Distribution Clustered 0.75+ 85-90% range (ok) ✓ OK
165+
FN Rate Change <0.5% increase +0.2-0.3% (ok) ✓ OK
166+
```
167+
168+
**Actions:**
169+
- If ✓ all metrics: Continue monitoring
170+
- If ⚠ any metric: Investigate cause (logs, sample findings)
171+
- If ✗ critical metric: Prepare rollback decision
172+
173+
### End of Week 1 (Day 7) - Decision Gate 1
174+
175+
**Question:** Do Phase 23 metrics validate targets? GO or NO-GO?
176+
177+
**GO Criteria (proceed to Phase 24.1):**
178+
- [ ] FP reduction: 17-28% confirmed
179+
- [ ] Cumulative reduction: 39-60% confirmed
180+
- [ ] Coordination activation frequency within ranges
181+
- [ ] Confidence distribution reasonable (85%+ in target range)
182+
- [ ] FN rate increase < 0.5%
183+
- [ ] No critical errors in production
184+
185+
**NO-GO Criteria (tune and re-evaluate):**
186+
- [ ] FP reduction < 12% (significantly below target)
187+
- [ ] Any coordination activation frequency > 50% outside range
188+
- [ ] FN rate increase > 1%
189+
- [ ] Service stability issues detected
190+
- [ ] Unexpected pattern in confidence distribution
191+
192+
**Gate 1 Decision Report Template:**
193+
194+
```
195+
=== PHASE 24.0 DECISION GATE 1 ===
196+
Date: 2026-05-13 (Day 7 post-deployment)
197+
198+
Metrics Summary:
199+
├─ FP Reduction: 21.3% (Target: 17-28%) ✅ GO
200+
├─ Cumulative: 45.8% (Target: 39-60%) ✅ GO
201+
├─ P4 Activation: 22/1000 (Target: 15-30) ✅ GO
202+
├─ P5 Activation: 31/1000 (Target: 20-40) ✅ GO
203+
├─ P6 Activation: 38/1000 (Target: 25-50) ✅ GO
204+
├─ Confidence Distribution: 88% in range ✅ GO
205+
├─ FN Rate Change: +0.3% (Target: <0.5%) ✅ GO
206+
└─ Production Stability: 99.8% uptime ✅ GO
207+
208+
Decision: ✅ GO TO PHASE 24.1
209+
210+
Next Phase:
211+
- Start P7 (Concurrency & Lock Ordering) implementation
212+
- Estimated timeline: 3-4 days
213+
- Prerequisites met: GCI0038 baseline validation required
214+
215+
Recommendation:
216+
Proceed with Phase 24.1 as planned. Phase 23 metrics validate targets.
217+
No tuning needed at this time.
218+
```
219+
220+
---
221+
222+
## If NO-GO: Tuning Procedure
223+
224+
If Gate 1 decision is NO-GO, follow tuning procedure:
225+
226+
### Step 1: Identify Root Cause
227+
228+
**Low FP Reduction (<12%):**
229+
- Check coordination activation frequency - are they firing at all?
230+
- Verify coordination code was deployed (check logs for "coordination:" tags)
231+
- Review confidence boost values - were they set correctly?
232+
- Sample 10 findings - manually verify boost is applied
233+
234+
**High FN Rate (>1%):**
235+
- Review coordination thresholds - too aggressive?
236+
- Check if any legitimate findings were suppressed
237+
- Sample 10 false negatives - analyze why missed
238+
- Consider lowering boost thresholds
239+
240+
**Out-of-range Activation Frequency:**
241+
- If too low: Coordination not detecting pattern, check scope filtering
242+
- If too high: Over-detecting, check precision (are they TP?)
243+
244+
### Step 2: Tune Thresholds
245+
246+
Example tuning for low FP reduction:
247+
248+
```
249+
Current:
250+
- P4: GCI0044 0.60→0.75, GCI0035 0.55→0.70 (not working)
251+
- Action: Increase boost → GCI0044 0.60→0.78, GCI0035 0.55→0.72
252+
253+
Or:
254+
255+
Current:
256+
- P5: GCI0039 0.55→0.85 (too aggressive, high FN)
257+
- Action: Decrease boost → GCI0039 0.55→0.75
258+
259+
Process:
260+
1. Adjust ONE coordination at a time
261+
2. Deploy to staging environment
262+
3. Re-run metrics collection (3-5 days)
263+
4. Analyze impact
264+
5. If improved: Deploy to production; if not: revert and try different adjustment
265+
```
266+
267+
### Step 3: Re-evaluate Decision
268+
269+
After tuning:
270+
- Collect metrics for 3-5 more days
271+
- Reassess Gate 1 criteria
272+
- Document tuning decisions in ADR-0005 appendix
273+
- Make final GO/NO-GO decision
274+
275+
---
276+
277+
## Phase 24.1 Prerequisites (If GO)
278+
279+
Before starting Phase 24.1 (P7 Concurrency), verify:
280+
281+
- [ ] Phase 23 metrics validated (Gate 1 passed)
282+
- [ ] GCI0038 (lock ordering) exists and has reasonable baseline confidence
283+
- [ ] GCI0016 (async violations) metrics stable post-Phase 23
284+
- [ ] Production environment stable (no cascading issues)
285+
286+
**GCI0038 Baseline Validation:**
287+
```bash
288+
# Query: How often does GCI0038 fire in production?
289+
# Expected: 5-15 per 1,000 findings (moderate frequency)
290+
# If <2 per 1,000: GCI0038 rarely fires, P7 may be low impact
291+
# If >30 per 1,000: GCI0038 very noisy, needs pre-tuning
292+
293+
grep "GCI0038" logs/labeling.log | wc -l # count detections
294+
# Divide by total findings to get frequency
295+
```
296+
297+
If GCI0038 baseline is weak (<2 per 1,000), recommend:
298+
- Defer P7 to Phase 25
299+
- Proceed with P8 (Cache) only
300+
- Adjust Phase 24.1 scope
301+
302+
---
303+
304+
## Success Metrics Dashboard (Example)
305+
306+
Create dashboard or spreadsheet with:
307+
308+
```
309+
┌─ Phase 24.0 Metrics Dashboard ─────────────────┐
310+
│ │
311+
│ FP Reduction: ████████░ 45.8% (Target: 39-60%)│
312+
│ P4 Activity: ███░░░░░░ 22/1000 (Target: 15-30)
313+
│ P5 Activity: ████░░░░░ 31/1000 (Target: 20-40)
314+
│ P6 Activity: █████░░░░ 38/1000 (Target: 25-50)
315+
│ Stability: █████████ 99.8% uptime │
316+
│ FN Rate: ░░░░░░░░░ +0.3% (Target: <0.5%) │
317+
│ │
318+
│ Gate 1 Status: ✅ GO TO PHASE 24.1 │
319+
│ │
320+
└─────────────────────────────────────────────────┘
321+
```
322+
323+
---
324+
325+
## Rollback Decision (Emergency)
326+
327+
If critical issues detected:
328+
329+
**Criteria for immediate rollback:**
330+
- Production outage caused by coordination
331+
- False positive rate > 70% (severe quality issue)
332+
- Service latency increase > 20%
333+
- Cascade failure detected
334+
335+
**Rollback procedure:**
336+
```bash
337+
# 1. Revert to v2.6.0 (Phase 21 last stable)
338+
git checkout v2.6.0
339+
dotnet build -c Debug
340+
# 2. Deploy to production
341+
# 3. Verify services recover
342+
# 4. Document incident
343+
# 5. Post-mortem analysis
344+
```
345+
346+
**Post-rollback:** Analyze root cause and decide on Phase 23 re-tuning vs re-architecture.
347+
348+
---
349+
350+
## References
351+
352+
- **Deployment Checklist:** `DEPLOYMENT_CHECKLIST_v2.7.0.md`
353+
- **Release Notes:** `RELEASE_NOTES_v2.7.0.md`
354+
- **Runbook:** `docs/operations/coordination-runbook.md`
355+
- **ADR-0005:** `docs/architecture/adr-0005-phase-23-heuristics-and-coordinations.md`
356+
357+
---
358+
359+
**Phase 24.0 Owner:** [Your Team]
360+
**Decision Gate 1 Date:** [7 days post-deployment]
361+
**Go-Live Target (if GO):** [Date + 2-3 weeks for Phase 24.1-24.2]
362+

0 commit comments

Comments
 (0)