Purpose: record every fallback to non-Roslyn .cs reading/editing so fallback patterns become Roslyn tool improvements.
Please forward this file to govert@dnakode.com as product feedback to improve RoslynSkills.
This is a temporary working log. It is safe to delete after feedback is forwarded and key learnings are captured in permanent artifacts.
- Date (
YYYY-MM-DD): - RoslynSkills version (
roscli --version): - Task/Context:
- Fallback action:
read|edit|both
- Why Roslyn path was not used:
- Roslyn command attempted (if any):
- Missing command/option hypothesis:
- Proposed improvement:
- Expected impact:
- correctness:
- latency:
- token_count:
- Follow-up issue/test link:
-
2026-02-09: Bootstrap policy entry -> Added mandatory fallback reflection rule toAGENTS.mdand skill workflow -> Use this log as source for exploratory command backlog. -
2026-02-09: Added new transport server source (src/RoslynAgent.TransportServer/Program.cs) via text patch before Roslyn session edits- Task/Context: implement persistent stdio transport server to benchmark MCP-style warm transport vs process-per-call CLI.
- Fallback action:
edit
- Why Roslyn path was not used:
- Creating a brand-new
.csfile with large initial content is still materially easier through text patch than current Roslyn command surface.
- Creating a brand-new
- Roslyn command attempted (if any):
- None for initial create; subsequent corrections used
session.open+session.apply_text_edits+session.commit.
- None for initial create; subsequent corrections used
- Missing command/option hypothesis:
- Missing
edit.create_file/session.createprimitive that can atomically create a new C# file with diagnostics in one call.
- Missing
- Proposed improvement:
- Add
edit.create_filewithfile_path,content, optionalapply, and immediatediag.get_file_diagnosticsresult.
- Add
- Expected impact:
- correctness: higher (single validated create path, fewer shell quoting issues).
- latency: lower (remove patch+re-validate loop).
- token_count: lower (avoid repeated full-file retries and transcript churn).
- Follow-up issue/test link:
- TODO: add command contract + integration tests for Roslyn-native file creation workflow.
-
2026-02-09: Context-compaction recovery read ofsrc/RoslynAgent.TransportServer/Program.csused plain file read before resuming Roslyn-first loop- Task/Context: continue MCP-style benchmark implementation after a compacted handover, quickly confirming pending transport server code state.
- Fallback action:
read
- Why Roslyn path was not used:
- Fast state rehydration step was done with
Get-Contentbefore re-entering Roslyn CLI command loop.
- Fast state rehydration step was done with
- Roslyn command attempted (if any):
- None before fallback; Roslyn path resumed immediately after with
diag.get_file_diagnosticsvalidation.
- None before fallback; Roslyn path resumed immediately after with
- Missing command/option hypothesis:
- Need a lower-friction Roslyn shorthand for "show full file source" that is as fast to invoke as shell file reads.
- Proposed improvement:
- Add
ctx.file_source <file_path> [--max-chars ...]with optional--region/--around-linefilters and compact preview metadata.
- Add
- Expected impact:
- correctness: higher (keeps reads in semantic-aware path and consistent envelopes).
- latency: lower (reduces command-selection hesitation between shell and Roslyn).
- token_count: lower (supports bounded source retrieval and avoids accidental full-file dumps).
- Follow-up issue/test link:
- TODO: define command contract and tests for a Roslyn-native full-file/region source retrieval command.
-
2026-02-09: Addedsrc/RoslynAgent.McpServer/Program.csvia text patch for full MCP protocol bootstrap- Task/Context: implement real MCP stdio server (framed JSON-RPC with
initialize,tools/list,tools/call) and wire harness MCP treatment lane. - Fallback action:
edit
- Why Roslyn path was not used:
- Large greenfield file creation and multi-hundred-line protocol scaffold was faster via deterministic patch than incremental session edits.
- Roslyn command attempted (if any):
- Roslyn was used for contracts/context lookup and post-edit diagnostics/build validation; creation itself did not use Roslyn edit primitives.
- Missing command/option hypothesis:
- Missing high-throughput Roslyn file-bootstrap flow for creating a new C# file from a full payload with immediate diagnostics and auto-usings assistance.
- Proposed improvement:
- Add
edit.create_file+ optionaledit.seed_from_template/session.seed_contentpathway with one-shot diagnostics and import suggestions.
- Add
- Expected impact:
- correctness: higher (creation path remains inside compiler-backed loop).
- latency: lower (single-step create+validate for large scaffolds).
- token_count: lower (fewer iterative shell/patch reconciliation steps).
- Follow-up issue/test link:
- TODO: add Roslyn-native create/seed command contract and integration tests for large-file bootstrap workflows.
- Task/Context: implement real MCP stdio server (framed JSON-RPC with
-
2026-02-09: MCP protocol compatibility hardening insrc/RoslynAgent.McpServer/Program.csused text patch for multi-method transport updates- Task/Context: adapt MCP server transport and resource handling for real codex compatibility (newline-delimited responses, dual-format read path, URI normalization).
- Fallback action:
both
- Why Roslyn path was not used:
- The change spanned several non-adjacent methods/constants with protocol-level edits; applying this efficiently required a coordinated text patch across the full file.
- Roslyn command attempted (if any):
ctx.file_outline,ctx.member_source, andnav.find_symbolwere used for semantic navigation before fallback edits.
- Missing command/option hypothesis:
- Missing Roslyn-native multi-region "edit transaction with semantic anchors across arbitrary methods/constants in one file" optimized for protocol refactors.
- Proposed improvement:
- Extend
edit.transactionwith symbol-anchor operations (for examplereplace_member_by_symbol_id) and constant/block patch ops to reduce full-file text patch dependence.
- Extend
- Expected impact:
- correctness: higher (member-targeted edits reduce accidental protocol regressions).
- latency: lower (fewer manual context/patch reconciliation steps).
- token_count: lower (less repeated source extraction for scattered edits).
- Follow-up issue/test link:
- TODO: design symbol-anchored single-file multi-region edit transaction contract + regression tests.
-
2026-02-10: CLI/session usability hardening required direct.csedits in RoslynSkills command host- Task/Context: add one-shot
edit.create_file, tightensession.openfile-type guardrails, and improve command argument discoverability for Claude/Codex flows. - Fallback action:
edit
- Why Roslyn path was not used:
- Updating the RoslynSkills tool implementation itself still requires editing command and CLI source files directly before the updated command surface exists.
- Roslyn command attempted (if any):
- None for edits; validation used full
dotnet testgate after changes.
- None for edits; validation used full
- Missing command/option hypothesis:
- Missing Roslyn-native self-hosted edit mode for tool-source changes with multi-file semantic anchors and command-surface regeneration support.
- Proposed improvement:
- Add a repository-scoped
edit.transactionsymbol-anchor mode plus command-descriptor extraction that can emit input-shape hints into CLI/MCP surfaces automatically.
- Add a repository-scoped
- Expected impact:
- correctness: higher (fewer hand-maintained usage/schema drifts).
- latency: lower (faster tool-surface iteration for command additions).
- token_count: lower (less back-and-forth argument guessing by agents).
- Follow-up issue/test link:
- TODO: prototype descriptor-driven input schema generation for MCP
tools/listand CLIdescribe-command.
- TODO: prototype descriptor-driven input schema generation for MCP
- Task/Context: add one-shot
-
2026-02-11: Workspace-binding reliability pass required direct.csreads/edits in command host and loader internals- Task/Context: make
nav.find_symbolanddiag.get_file_diagnosticsworkspace-aware by default, exposeworkspace_contextmetadata, and align CLI/MCP guidance surfaces. - Fallback action:
both
- Why Roslyn path was not used:
- Implementing RoslynSkills internals still requires non-Roslyn file reads/edits while introducing new command-surface behavior and runtime dependencies.
- Roslyn command attempted (if any):
- Validation and empirical checks were run through
roscli(list-commands,nav.find_symbol,diag.get_file_diagnostics) after implementation.
- Validation and empirical checks were run through
- Missing command/option hypothesis:
- Missing self-hosted "edit RoslynSkills source semantically" operation for multi-file command-surface/tooling refactors.
- Proposed improvement:
- Add a dedicated repository-maintainer mode over
edit.transactionwith symbol-id targeting across files plus auto-regenerated command usage/schema hint updates.
- Add a dedicated repository-maintainer mode over
- Expected impact:
- correctness: higher (fewer hand-wired contract/help drift regressions).
- latency: lower (faster command-surface evolution loops).
- token_count: lower (less manual source inspection and patch iteration).
- Follow-up issue/test link:
- Added regression coverage in
tests/RoslynSkills.Core.Tests/CommandTests.csandtests/RoslynSkills.Cli.Tests/CliApplicationTests.cs.
- Added regression coverage in
- Task/Context: make
-
2026-02-13: Bootstrap context used plain-text search onWorkspaceSemanticLoader.csto confirm the MSBuildLocator fix site- Task/Context: read
HANDOVER.mdand rehydrate context; verify the "prefer .NET SDK MSBuild (DiscoveryType.DotNetSdk)" registration logic and its rationale (CS0518false-positive) quickly. - Fallback action:
read
- Why Roslyn path was not used:
- During bootstrap, a quick
Select-Stringwas used to jump directly to theDiscoveryType.DotNetSdklogic without first doing Roslyn navigation to the containing method/member.
- During bootstrap, a quick
- Roslyn command attempted (if any):
- None before fallback.
- Missing command/option hypothesis:
- Missing a low-friction Roslyn-native text search/snippet command for
.csfiles to locate non-symbol tokens (e.g., enum values, comments) without dropping to shell tools.
- Missing a low-friction Roslyn-native text search/snippet command for
- Proposed improvement:
- Add
ctx.search_text(single file, optional workspace-scoped variant) returning bounded matches with line/column spans suitable for follow-onctx.member_source/session.apply_text_edits.
- Add
- Expected impact:
- correctness: higher (keeps reads inside a consistent, workspace-aware envelope and reduces accidental drift between ad-hoc text inspection and Roslyn snapshots).
- latency: lower (reduces hesitation and command-churn between
rg/Select-Stringand Roslyn commands). - token_count: lower (bounded snippets + spans avoid full-file dumps).
- Follow-up issue/test link:
- TODO: add command contract + tests + include in
list-commandspit-of-successfirst_steps.
- TODO: add command contract + tests + include in
- Task/Context: read
-
2026-02-16: Cross-project replication/config tracing used shellrgfor multi-pattern hunting after targeted Roslyn reads- Task/Context: trace configuration save/update and replication pathways across multiple projects (
AimsWebNancy,AimsDataConnection,AimsViewModel,AimsConsole) while confirming specific member bodies viactx.member_source. - Fallback action:
read
- Why Roslyn path was not used:
- Needed broad workspace text hunting for endpoint/action names and method call patterns; current Roslyn surface is strong for member-local extraction (
ctx.member_source) but weak for "find these N textual patterns across many projects" workflows. - Shell regex quoting/escaping introduced avoidable churn (
rgunclosed-group parse error), then required retries.
- Needed broad workspace text hunting for endpoint/action names and method call patterns; current Roslyn surface is strong for member-local extraction (
- Roslyn command attempted (if any):
ctx.member_sourceonExcelDataManager.csandAimsApplicationViewModel.cs.
- Missing command/option hypothesis:
- Missing Roslyn-native workspace text search with pattern list input (literal + regex modes), safe escaping, and bounded result envelopes.
- Missing Roslyn-native "find invocations by symbol/member name across workspace" command for call-path discovery without raw grep.
- Proposed improvement:
- Add
ctx.search_textwith payload{ patterns: [...], mode: literal|regex, roots/include_globs, max_results }, returning file/line/preview in Roslyn JSON envelope. - Add
nav.find_invocations(ornav.find_calls) that accepts symbol id or member signature and returns call sites across projects. - Add
query.batchto run several search intents in one Roslyn round-trip for investigative tasks.
- Add
- Expected impact:
- correctness: higher (fewer missed/overmatched hits from ad-hoc regex and shell escaping mistakes).
- latency: lower (fewer retry loops and less context switching between Roslyn + shell tools).
- token_count: lower (structured bounded responses instead of repeated wide grep output + manual triage).
- Follow-up issue/test link:
- TODO: add command proposals to command-surface backlog and include a benchmark fixture for "cross-project investigative tracing".
- Task/Context: trace configuration save/update and replication pathways across multiple projects (
-
2026-02-16: Implemented new investigative commands in RoslynSkills via direct source edits (ctx.search_text,nav.find_invocations,query.batch)- Task/Context: ship missing command-surface capabilities identified from Codex/Claude fallback churn, including CLI/MCP guidance and coverage tests.
- Fallback action:
both
- Why Roslyn path was not used:
- RoslynSkills currently lacks a self-hosted maintainer workflow for editing its own command implementations and registry wiring semantically across multiple files.
- Roslyn command attempted (if any):
scripts/roscli.cmd list-commands --compactand post-changelist-commands --ids-onlyvalidation to verify command registration and envelope shape.
- Missing command/option hypothesis:
- Missing repository-maintainer semantic edit workflow for RoslynSkills self-evolution (multi-file symbol-anchored edits + command-surface regeneration).
- Proposed improvement:
- Add a maintainer-oriented transaction mode that can update command classes/registry/CLI usage hints atomically and validate schema/help drift in one pass.
- Expected impact:
- correctness: higher (less manual drift between command code and guidance surfaces).
- latency: lower (faster command-surface iteration loops).
- token_count: lower (fewer manual grep/read/patch cycles when evolving RoslynSkills itself).
- Follow-up issue/test link:
- Added regression coverage in
tests/RoslynSkills.Core.Tests/BreadthCommandTests.csandtests/RoslynSkills.Cli.Tests/CliApplicationTests.cs.
- Added regression coverage in
-
2026-02-16: Call-hierarchy naming/discoverability verification used direct.csread on CLI usage-hint logic- Task/Context: confirm
nav.call_hierarchyvsnav.call_chainnaming behavior and patchdescribe-commandusage examples/optional-property hints for the alias. - Fallback action:
read
- Why Roslyn path was not used:
- Needed a quick inspection of the CLI host formatting logic (
GenerateUsageHints) where behavior depends on string templates rather than symbol navigation alone.
- Needed a quick inspection of the CLI host formatting logic (
- Roslyn command attempted (if any):
scripts/roscli.cmd list-commands --compactscripts/roscli.cmd describe-command nav.call_chainscripts/roscli.cmd nav.call_chain ...smoke run
- Missing command/option hypothesis:
- Missing Roslyn-native "show command host member source by symbol id/name" flow optimized for self-hosted CLI UX tuning.
- Proposed improvement:
- Add a maintainer helper command that resolves and returns bounded source for command-host methods (for example
ctx.command_host_member_source --type CliApplication --member GenerateUsageHints).
- Add a maintainer helper command that resolves and returns bounded source for command-host methods (for example
- Expected impact:
- correctness: higher (less drift between alias behavior and help text).
- latency: lower (faster diagnosis of command-surface UX regressions).
- token_count: lower (fewer ad-hoc full-file reads while tuning command guidance).
- Follow-up issue/test link:
- Updated usage-hint behavior in
src/RoslynSkills.Cli/CliApplication.cs; validation via CLI/core test slices.
- Updated usage-hint behavior in
- Task/Context: confirm
-
2026-02-16: Removednav.call_chainalias to keep canonical Roslyn-aligned naming- Task/Context: user requested a single canonical name and no alias; removed
nav.call_chainfrom command surface and docs/tests. - Fallback action:
edit
- Why Roslyn path was not used:
- This change modifies RoslynSkills implementation internals (
Corecommand registry, CLI host, MCP schema hints, tests/docs), which currently requires direct source edits.
- This change modifies RoslynSkills implementation internals (
- Roslyn command attempted (if any):
scripts/roscli.cmd list-commands --ids-onlyscripts/roscli.cmd describe-command nav.call_hierarchyscripts/roscli.cmd describe-command nav.call_chain(expected fail/command_not_found)
- Missing command/option hypothesis:
- Missing self-hosted maintainer workflow for multi-file command-surface deprecations/removals with descriptor-aware propagation.
- Proposed improvement:
- Add maintainer-focused command-surface transaction support to update registry + CLI usage hints + MCP input hints atomically from command descriptor diffs.
- Expected impact:
- correctness: higher (prevents stale alias/docs/schema drift).
- latency: lower (fewer manual multi-file touchpoints during command-surface cleanup).
- token_count: lower (less repetitive inspection/verification loops for rename/deprecation passes).
- Follow-up issue/test link:
- Validation:
dotnet build RoslynSkills.slnx; CLI/Core targeted test slices passed.
- Validation:
- Task/Context: user requested a single canonical name and no alias; removed
-
2026-02-16: Added command maturity model (stable|advanced|experimental) and surfaced metadata in CLI/MCP + skills/docs- Task/Context: introduce explicit expectations for heuristic/slower commands and establish an extensible pattern for advanced/experimental analysis tools.
- Fallback action:
edit
- Why Roslyn path was not used:
- This is RoslynSkills self-evolution across contracts, command descriptors, CLI host, MCP metadata, tests, and docs; no self-hosted semantic maintainer flow exists yet.
- Roslyn command attempted (if any):
- Post-change validation with
roscli list-commands,roscli describe-command, plus build/test gates.
- Post-change validation with
- Missing command/option hypothesis:
- Missing descriptor-driven maintainer transactions that can propagate command metadata changes atomically across CLI/MCP/doc surfaces.
- Proposed improvement:
- Add a metadata propagation tool that syncs command descriptors to CLI list/describe output shapes, MCP catalog annotations, and doc stubs.
- Expected impact:
- correctness: higher (less drift between metadata contract and tool surfaces).
- latency: lower (faster rollout of non-stable command caveats).
- token_count: lower (fewer repeated manual consistency checks).
- Follow-up issue/test link:
- Validation:
dotnet build RoslynSkills.slnx; targeted core/CLI test slices and roscli command checks.
- Validation:
-
2026-02-16: Added static-analysis command lane (analyze.*) and dual-lane roscli wrappers via direct source reads/edits- Task/Context: implement
analyze.unused_private_symbols,analyze.dependency_violations,analyze.impact_slice,analyze.override_coverage, andanalyze.async_risk_scan; add docs/tests; add pinned stable/dev roscli plumbing. - Fallback action:
both
- Why Roslyn path was not used:
- RoslynSkills still lacks a self-hosted maintainer workflow for multi-file command implementation/wiring across Core, CLI, MCP, tests, and docs.
- Roslyn command attempted (if any):
scripts/roscli-stable.cmd list-commands --compactscripts/roscli-dev.cmd list-commands --ids-only
- Missing command/option hypothesis:
- Missing maintainer-focused semantic transaction flow that can update command registrations/help hints/tests atomically and validate command-surface drift.
- Proposed improvement:
- Extend maintainer mode over
edit.transactionwith descriptor-aware propagation checks (registry + CLI usage + MCP input hints + docs/test checklist) in one guided operation.
- Extend maintainer mode over
- Expected impact:
- correctness: higher (lower risk of command-surface/documentation/test drift during feature additions).
- latency: lower (fewer manual cross-file patch/validation loops).
- token_count: lower (less repeated source inspection to keep multiple surfaces in sync).
- Follow-up issue/test link:
- Added coverage in
tests/RoslynSkills.Core.Tests/BreadthCommandTests.csandtests/RoslynSkills.Cli.Tests/CliApplicationTests.cs.
- Added coverage in
- Task/Context: implement
-
2026-02-16: Codex run-fragment retrospective identified two additional pit-of-success gaps now partially addressed- Task/Context: review a real Codex investigative sequence over a large C# workspace (symbol/member tracing, call-path discovery, repeated
ctx.member_source, and shell regex retries) to answer “could RoslynSkills have been more helpful?” - Fallback action:
read
- Why Roslyn path was not used:
- Cross-file tracing still drifted into raw regex/shell loops when command affordances were not discoverable enough at point-of-use.
- VB parity holes in high-traffic context commands (
ctx.file_outline,ctx.member_source) reduced confidence in mixed-language workflows.
- Roslyn command attempted (if any):
ctx.member_sourceon multiple members;nav.find_invocations;ctx.search_text;query.batch.
- Missing command/option hypothesis:
- MCP tool schemas lacked explicit input hints/examples for
ctx.file_outlineandctx.member_source, increasing argument uncertainty. ctx.member_sourcebody extraction semantics in VB need stronger deterministic anchoring tests for declaration-line anchors.
- MCP tool schemas lacked explicit input hints/examples for
- Proposed improvement:
- Add explicit MCP input-schema hints and URI examples for
ctx.file_outline/ctx.member_source(implemented in this pass). - Treat VB parity on context commands as first-class and keep dedicated VB regression tests for outline/member-source behavior (implemented in this pass).
- Follow with a targeted benchmark slice measuring reduced shell fallback on cross-project investigative tasks after schema/guidance upgrades.
- Add explicit MCP input-schema hints and URI examples for
- Expected impact:
- correctness: higher (stronger mixed-language reliability and fewer argument-shape errors).
- latency: lower (less retry churn from unclear command inputs).
- token_count: lower (fewer exploratory retries and shell transcript noise).
- Follow-up issue/test link:
- Added tests in
tests/RoslynSkills.Core.Tests/VbCommandTests.csfor VBctx.file_outlineandctx.member_sourceflows.
- Added tests in
- Task/Context: review a real Codex investigative sequence over a large C# workspace (symbol/member tracing, call-path discovery, repeated
-
2026-02-17: Wider benchmark sweep uncovered operation-specific guidance mismatch and overly strict constraint shape checks- Task/Context: broaden benchmark scope beyond rename tasks (change-signature, replace-member-body, create-file, add-member) while validating Claude skill-guidance and paired harness behavior.
- Fallback action:
both
- Why Roslyn path was not used:
- Existing paired guidance profiles were rename-centric, causing treatment runs on non-rename tasks to spend calls on
nav.find_symbol Process/edit.rename_symboleven when task intent wasedit.add_member/edit.change_signature/etc. - Constraint checks encoded one syntactic form for
add-member-threshold-v1(block-bodied method), generating false negatives for semantically valid expression-bodied output.
- Existing paired guidance profiles were rename-centric, causing treatment runs on non-rename tasks to spend calls on
- Roslyn command attempted (if any):
scripts/roscli.cmd list-commands --compactscripts/roscli.cmd edit.add_member ...scripts/roscli.cmd diag.get_file_diagnostics ...
- Missing command/option hypothesis:
- Missing operation-neutral guidance profile that steers command choice by task family rather than rename default.
- Missing “semantic-equivalence tolerant” constraint checks for style variants that preserve behavior.
- Proposed improvement:
- Add
operation-neutral-v1guidance profile in paired harness for multi-operation tasks (implemented). - Expand task catalog + constraints for non-rename families and accept equivalent member-body styles where appropriate (implemented).
- Add one-click multi-task paired sweep helper to avoid PowerShell array binding friction and reduce operator error.
- Add
- Expected impact:
- correctness: higher (fewer false negatives from syntactic-only checks; better task-command alignment).
- latency: lower (reduced wasted Roslyn calls on irrelevant rename flows).
- token_count: lower (fewer exploratory retries and less command-contract churn).
- Follow-up issue/test link:
- Updated
benchmarks/scripts/Run-PairedAgentRuns.ps1with expanded task IDs +operation-neutral-v1. - Added regression coverage in
tests/RoslynSkills.Benchmark.Tests/PairedRunHarnessScriptTests.cs.
- Updated
-
2026-02-17: Renamed CFG command id and extended benchmark preflight to Gemini using direct source edits- Task/Context: apply approved command rename (
analyze.cfg->analyze.control_flow_graph, no alias), add Gemini preflight detection/tests, and add transcript-split tooling scripts for overhead analysis. - Fallback action:
edit
- Why Roslyn path was not used:
- RoslynSkills currently has no self-hosted maintainer transaction command that can propagate command-id changes through Core/CLI/MCP/tests/docs and benchmark infrastructure atomically.
- Roslyn command attempted (if any):
scripts/roscli.cmd describe-command analyze.control_flow_graphscripts/roscli.cmd describe-command analyze.cfg(expectedcommand_not_found)
- Missing command/option hypothesis:
- Missing maintainer-grade command-surface rename primitive with contract checks (registry routing, direct CLI shorthand, MCP schema hints, query.batch support, docs/examples).
- Proposed improvement:
- Add
maint.rename_command_id(or equivalent) that computes and validates cross-surface rename impact before apply.
- Add
- Expected impact:
- correctness: higher (prevents stale ids on one surface).
- latency: lower (fewer manual search/patch cycles).
- token_count: lower (less repeated inspection during command-surface refactors).
- Follow-up issue/test link:
- Added regression checks in
tests/RoslynSkills.Cli.Tests/CliApplicationTests.csfor new id acceptance + old id rejection. - Added Gemini probe coverage in
tests/RoslynSkills.Benchmark.Tests/AgentEvalPreflightCheckerTests.cs.
- Added regression checks in
- Task/Context: apply approved command rename (
-
2026-02-17: External split-lane run on MediatR initially produced zero treatment Roslyn usage due missing launcher discoverability- Task/Context: tool-thinking split experiment on external repo (
MediatR) where treatment prompt asked for RoslynSkills usage. - Fallback action:
both
- Why Roslyn path was not used:
- External repo did not include local
scripts/roscli*; treatment guidance lacked an explicit executable path to host RoslynSkills launcher, so agent stayed text-only.
- External repo did not include local
- Roslyn command attempted (if any):
- None in first run (
treatment.roslyn_command_count=0).
- None in first run (
- Missing command/option hypothesis:
- Split harness lacked a first-class mechanism to provide executable Roslyn launcher coordinates to treatment lanes on non-RoslynSkills repositories.
- Proposed improvement:
- Inject resolved host launcher path into treatment prompt and explicit prohibition into control prompt; stamp launcher path in run summary (implemented in
Run-ToolThinkingSplitExperiment.ps1).
- Inject resolved host launcher path into treatment prompt and explicit prohibition into control prompt; stamp launcher path in run summary (implemented in
- Expected impact:
- correctness: higher (valid treatment condition with actual Roslyn usage).
- latency: lower (fewer failed/irrelevant attempts to discover Roslyn command entrypoints).
- token_count: lower (reduced exploration churn from missing tool entrypoint).
- Follow-up issue/test link:
- Verified by MediatR reruns:
- pre-injection
treatment_roslyn=0:artifacts/tool-thinking-split-runs/20260217-084934-codex-mediatr-invalid-notification-codex-v1/ - post-injection Codex
treatment_roslyn=3:artifacts/tool-thinking-split-runs/20260217-085723-codex-mediatr-invalid-notification-codex-v2/ - post-injection Claude
treatment_roslyn=1:artifacts/tool-thinking-split-runs/20260217-090210-claude-mediatr-invalid-notification-claude-v1/
- pre-injection
- Verified by MediatR reruns:
- Task/Context: tool-thinking split experiment on external repo (
-
2026-02-17: Added “used well” trajectory metrics via script/test updates using direct source edits- Task/Context: user requested validation that reveals whether RoslynSkills is used well, not just used.
- Fallback action:
edit
- Why Roslyn path was not used:
- This work modifies RoslynSkills analyzer/test internals (PowerShell + C# test files) and currently lacks a self-hosted maintainer command flow for cross-file metric-contract changes.
- Roslyn command attempted (if any):
- Not applicable for implementation edits; validation performed via benchmark script runs and
dotnet testsuite.
- Not applicable for implementation edits; validation performed via benchmark script runs and
- Missing command/option hypothesis:
- Missing maintainer-grade command to evolve benchmark metric schemas and propagate contract updates across scripts/tests/docs atomically.
- Proposed improvement:
- Add maintainer benchmark-contract tooling (for example
maint.update_benchmark_metric_contract) with impact checks across script output fields and tests.
- Add maintainer benchmark-contract tooling (for example
- Expected impact:
- correctness: higher (lower risk of metric drift between script and test expectations).
- latency: lower (faster metric evolution cycles).
- token_count: lower (fewer manual reconcile loops across scripts/tests/docs).
- Follow-up issue/test link:
tests/RoslynSkills.Benchmark.Tests/ToolThinkingSplitScriptTests.csbenchmarks/scripts/Analyze-ToolThinkingSplit.ps1