- Source: Anthropic Engineering Blog (11 articles)
- Date Range: September 2024 - October 2025
- Confidence: HIGH (all official Anthropic sources)
- Purpose: Enhance claude-user-memory as system-wide Claude Code CLI enhancement
Source: Building Effective Agents (Anthropic)
- Workflows: Systems where LLMs and tools are orchestrated through predefined code paths
- Agents: Systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks
- Decision criteria: Use agents for open-ended problems where you can't predict required steps or hardcode a fixed path
Source: SWE-bench Engineering (Anthropic)
- Minimal scaffolding: Give as much control as possible to the language model itself
- Simple tools: Bash Tool for executing commands, Edit Tool for viewing/editing files
- Agent autonomy: Let the model decide the approach, don't force specific workflows
Source: Building agents with the Claude Agent SDK (Sep 29, 2025)
- Renamed from Claude Code SDK to reflect broader capabilities beyond coding
- Core infrastructure solved:
- Memory management across long-running tasks
- Permission systems balancing autonomy with user control
- Subagent coordination toward shared goals
- Context as structure: Folder and file structure becomes a form of context engineering
- Tool access: bash, file editing, file creation, file search → general-purpose agents with computer access
Source: Multiple articles
- Truth over speed - but achieve both through systematic approach
- Low-level and unopinionated - Close to raw model access (Claude Code)
- Flexible, customizable, scriptable, safe - Power tool approach
- Agent-centric perspective: Rethink every detail from non-deterministic viewpoint
Source: How we built our multi-agent research system (Jun 13, 2025)
Architecture:
- Lead agent: Analyzes query, develops strategy, spawns subagents
- Subagents: Explore different aspects simultaneously (3-5 parallel)
- Intelligent filtering: Iteratively search, gather, return results to lead agent
- Parallel execution: Two levels - agent spawning AND tool calling within agents
Performance:
- Multi-agent (Opus 4 lead + Sonnet 4 subagents) outperformed single Opus 4 by 90.2%
- Research time reduced by up to 90% for complex queries
- Cost consideration: Agents use 4x more tokens than chat; multi-agent systems use 15x more tokens
Early failures & solutions:
- Problem: Spawning 50 subagents for simple queries → Solution: Better prompt engineering for agent spawning logic
- Problem: Scouring web endlessly for nonexistent sources → Solution: Termination conditions in prompts
- Problem: Agents distracting each other → Solution: Controlled communication patterns
Economic viability: "Multi-agent systems require tasks where the value is high enough to pay for increased performance"
Source: Claude Agent SDK
- Subagent coordination: Multiple agents working toward shared goal
- Memory sharing: Long-running task context management
- Permission delegation: Balance between autonomy and user control
Source: Writing effective tools for agents — with agents (Sep 11, 2025)
Core principle: Shift from "engineers writing APIs for other engineers" to agent-centric design (non-deterministic)
Best practices:
-
Prompt-engineer tool descriptions: Most effective improvement method
- Tool descriptions loaded into agent context
- Collectively steer agents toward effective behaviors
- Example: Claude appending "2025" to searches → Fixed by improving tool description
-
Use agents to build tools:
- Let agents analyze evaluation transcripts
- Paste transcripts into Claude Code
- Claude refactors lots of tools at once
-
Rethink from agent perspective: Every detail needs reconsidering for non-deterministic use
Source: Desktop Extensions - One-click MCP server installation (Jun 26, 2025)
Problem solved: Complex MCP server installation
Solution: Desktop Extensions (.mcpb files)
- Format: Zip archives with manifest.json (similar to Chrome .crx, VS Code .vsix)
- Contents: Entire MCP server + all dependencies
- Installation: Single click in Claude Desktop (Settings > Extensions)
- Distribution: Extension directory with Anthropic-reviewed tools + custom extensions
- Open source: Specification, toolchain, schemas at github.com/anthropics/mcpb
Installation methods:
- Browse extension directory → Click Install
- Install custom .mcpb file directly
Source: Multiple articles
- Describe for new hires: Write tool docs as if onboarding a team member
- Make implicit explicit: Don't assume context agents might bring
- Agent analysis: Tools should be analyzable and improvable by agents themselves
Source: Effective context engineering for AI agents (Sep 29, 2025)
Definition: The art and science of curating what goes into the limited context window from the constantly evolving universe of possible information
Evolution: Natural progression of prompt engineering
- Old: Finding the right words for prompts
- New: "What configuration of context is most likely to generate desired behavior?"
Scope includes:
- System instructions
- Tools
- Model Context Protocol (MCP)
- External data
- Message history
Source: Effective context engineering
- Problem: Degradation of model outputs as information overloads limited attention windows
- Solution: Active context curation and editing
Source: Effective context engineering
- Memory Tool + Context Editing: 39% improvement in agent-based search performance
- Token consumption: 84% reduction in 100-round web search
- LLM constraint: Finite attention budget requires smallest possible set of high-signal tokens
Source: Building Effective Agents
- Few-shot prompting: Strongly recommended, curate diverse canonical examples
- Show expected behavior: Examples portray the agent's desired behavior patterns
- Minimize tokens: Find smallest high-signal token set for desired outcome
Source: Claude Code best practices (Apr 18, 2025)
- Purpose: Special file automatically pulled into context at conversation start
- Use cases: Repository etiquette, developer environment setup, project-specific guidelines
- Benefit: Persistent context across all Claude Code sessions in that repository
Source: Building Effective Agents
- Extensive testing in sandboxed environments
- Appropriate guardrails for autonomous agents
- Trust level: Must have some level of trust in agent decision-making
Source: A postmortem of three recent issues (Sep 17, 2025)
Transparency principle: "We never reduce model quality due to demand, time of day, or server load"
Three infrastructure bugs (Aug-Sep 2025):
-
Context Window Routing Error (Aug 31)
- Impact: 16% of Sonnet 4 requests at worst hour
- Cause: Short/long-context requests routed to wrong server pools
- "Sticky" routing: Once wrong server selected, subsequent requests also wrong
- Fix: Deployed Sep 4
-
TPU Server Output Corruption (Aug 25-Sep 2)
- Impact: Opus 4.1, Opus 4, Sonnet 4
- Symptom: Thai/Chinese characters in English prompts
- Cause: Misconfiguration in token generation
- Fix: Rollback Sep 2
-
XLA Compiler Bug (Sep 4-12)
- Impact: Haiku 3.5, parts of Sonnet 4 and Opus 3
- Symptom: Excluding most probable token during generation
- Cause: ML compiler bug unintentionally triggered
- Fix: Rollback Sep 4 (Haiku), Sep 12 (Opus)
Lessons: Infrastructure quality directly impacts model outputs; transparent postmortems build trust
Source: Raising the bar on SWE-bench Verified (Oct 2024)
- Achievement: Claude 3.5 Sonnet scored 49% (previous SOTA: 45%)
- Improvement: 33.4% → 49.0% on SWE-bench Verified
- Verification: 500-problem subset reviewed by humans for solvability
- Real-world tasks: Resolve actual GitHub issues from open-source Python repos
Source: The "think" tool (Mar 20, 2025)
Purpose: Create dedicated space for structured thinking during complex tasks
Difference from Extended Thinking:
- Extended thinking: More detailed reasoning before generating response
- Think tool: Stop and think during response generation
Best use cases:
- Calling complex tools
- Analyzing tool outputs in long chains
- Navigating policy-heavy environments
- Sequential decisions where mistakes are costly
Performance:
- Airline domain: 54% relative improvement over baseline
- SWE-bench contribution: 1.6% improvement on average just from think tool
- TAU-bench retail: 62.6% → 69.2%
- TAU-bench airline: 36.0% → 46.0%
Implementation: Minimal overhead, integrated into Claude 3.7 Sonnet (state-of-the-art 0.623 on SWE-Bench)
Source: Multi-agent research system
- Agent excess: Spawning too many agents for simple tasks
- Endless search: Not recognizing when information doesn't exist
- Agent interference: Subagents distracting each other
- Solution approach: Prompt engineering is primary lever for fixing behaviors
Source: Claude Code best practices (Apr 18, 2025)
1. Research and Plan First
- Problem: Jumping straight to coding
- Solution: Ask Claude to research and plan before coding
- Result: Significant performance improvement for problems requiring deeper thinking
2. Test-Driven Development (TDD)
- Anthropic favorite: For changes verifiable with unit/integration/e2e tests
- Power: TDD becomes even more powerful with agentic coding
3. Extended Thinking Mode
- Trigger word: "think"
- Levels: "think" < "think hard" < "think harder" < "ultrathink"
- Mechanism: Additional computation time for complex problems
4. CLAUDE.md Configuration
- Purpose: Repository etiquette and environment setup
- Behavior: Automatically pulled into every conversation
- Best for: Project-specific guidelines, conventions, setup instructions
5. Codebase Exploration
- Anthropic usage: Core onboarding workflow
- Benefits: Faster ramp-up, reduced load on other engineers
- Application: Learning and exploration at scale
6. Git Operations
- Anthropic engineers: Use Claude for 90%+ of git interactions
- Implication: Agent-assisted version control is production-ready
Source: Introducing Contextual Retrieval (Sep 2024)
Problem: Lost context in chunked documents for retrieval
Solution: Two sub-techniques
- Contextual Embeddings: Prepend chunk-specific explanatory context before embedding
- Contextual BM25: Add context before creating BM25 index
Performance:
- Standalone: 49% reduction in failed retrievals
- With reranking: 67% reduction in failed retrievals
Example transformation:
Original: "The company's revenue grew by 3% over the previous quarter."
Contextualized: "This chunk is from an SEC filing on ACME corp's
performance in Q2 2023; the previous quarter's revenue was $314 million.
The company's revenue grew by 3% over the previous quarter."
Resources: Implementation details in Anthropic's cookbook
- Give language model as much decision-making power as possible
- Keep infrastructure minimal and unopinionated
- Let agents direct their own processes
- Context engineering > prompt engineering for agent systems
- CLAUDE.md pattern for persistent project context
- Folder/file structure as context engineering mechanism
- Active context curation to fight "context rot"
- Lead agent + subagents architecture (90.2% performance gain)
- Parallel execution at multiple levels (agent spawning + tool calling)
- Cost awareness: 15x token usage requires high-value tasks
- Prompt engineering as primary control lever
- Tools designed from agent perspective (non-deterministic use)
- Agents building/improving their own tools
- Prompt-engineered tool descriptions as steering mechanism
- MCP packaging (.mcpb) for one-click distribution
- Research → Plan → Implement workflow
- "Think" tool for complex decision points
- Extended thinking modes (think, think hard, ultrathink)
- TDD as forcing function for quality
- Public postmortems for infrastructure issues
- Never sacrifice quality for performance/demand
- SWE-bench as quality bar (49% = state-of-the-art)
- Real-world task validation (not synthetic benchmarks)
- Agent Skills: Specialized capabilities (Claude Code, Research, etc.)
- Subagent coordination for complex goals
- Memory management for long-running tasks
- Permission systems balancing autonomy & control
- Agent autonomy is the goal - Minimal human intervention for complex tasks
- Context is everything - What goes into context matters more than prompt wording
- Transparency builds trust - Public postmortems, honest about failures
- Economics drive adoption - 15x cost needs 15x+ value
- Real-world validation - SWE-bench, GitHub issues, not toy problems
- Agents building agents - Meta-capability for continuous improvement
- Low-level primitives over high-level frameworks
- Flexibility over convenience
- Agent-centric over engineer-centric
- Non-deterministic design patterns
- Minimal scaffolding, maximal model control
- Infrastructure bugs are transparent (postmortem culture)
- Never reduce quality for demand/load
- State-of-the-art benchmarks as bar (SWE-bench 49%)
- Human verification (SWE-bench Verified: 500 reviewed problems)
Based on Anthropic's philosophy, what should we call our system-wide Claude Code CLI enhancement?
- Pros: Captures execution environment, aligns with "runtime" terminology
- Cons: Generic, doesn't capture orchestration aspect
- Pros: Suggests interconnected intelligence, multi-agent coordination
- Cons: May sound too abstract
- Pros: Honors Anthropic's context engineering emphasis, "fabric" suggests foundational layer
- Cons: Might emphasize context over agents
- Pros: Technical, captures foundational layer + coordination
- Cons: Less approachable for users
- Pros: "Nexus" = connection point, suggests hub for agent coordination
- Cons: Might sound too futuristic
- Pros: Honors existing VAMFI naming, "engine" suggests power
- Cons: Less universal appeal (Sanskrit-specific)
- Pros: "Substrate" = foundational layer agents build upon
- Anthropic alignment: Matches their low-level primitives philosophy
- Cons: Technical term, may need explanation
- Pros: Direct alignment with Anthropic's context engineering focus
- Cons: Doesn't capture multi-agent aspect
Reasoning:
- Captures foundational layer concept (substrate)
- Emphasizes agent-centric design (agentic)
- Aligns with Anthropic's minimal scaffolding philosophy
- Suggests something agents build upon, not something that constrains them
- Technical but accessible
Tagline: "The foundational layer for Claude Code superintelligence"
- ✅ research-methodology: Add contextual retrieval patterns
- ✅ planning-methodology: Add "think before act" protocol, multi-agent planning
- ✅ quality-validation: Add SWE-bench-inspired quality gates
- ✅ pattern-recognition: Add agent skill recognition patterns
- 🆕 context-engineering: NEW SKILL - Active context curation, CLAUDE.md patterns
- ✅ /workflow: Add multi-agent orchestration option (lead + subagents)
- ✅ /research: Integrate contextual retrieval
- ✅ /plan: Add think tool protocol
- ✅ /implement: Add TDD enforcement, self-correction loops
- 🆕 /context: NEW COMMAND - Analyze and optimize context configuration
- 🆕 pre-tool-use: Think tool protocol for complex decisions
- 🆕 post-tool-use: Context editing to prevent context rot
- 🆕 pre-agent-spawn: Economic viability check (is multi-agent worth 15x cost?)
- 🆕 quality-gate: SWE-bench-inspired validation
- 🆕 .mcpb packaging: Create installer for our agentic substrate
- 🆕 Desktop extension: One-click install for users
- 🆕 Agent Skills registry: Catalog of available specialized agents
- ✅ chief-architect: Add lead agent + subagent coordination pattern (90% improvement potential)
- ✅ docs-researcher: Add contextual retrieval (49-67% better retrieval)
- ✅ implementation-planner: Add think tool for complex planning
- ✅ code-implementer: Add TDD enforcement, git operations, extended thinking modes
- ✅ All 18 agents: Add think tool capability
- ✅ brahma-navigator: Lead agent pattern with subagent spawning
- ✅ brahma-clarifier: Context engineering for requirements
- ✅ brahma-analyzer: Quality gates inspired by SWE-bench
- 🆕 install.sh: Copy to ~/.claude/ with .mcpb packaging option
- 🆕 CLAUDE.md templates: Auto-setup for new projects
- 🆕 Constitution templates: Project governance from day one
- 🆕 Extension manifest: Desktop extension for one-click install
- 🆕 README: "Agentic Substrate" positioning
- 🆕 Quick Start: One-click installation guide
- 🆕 Philosophy: Explain Anthropic alignment
- 🆕 Architecture: Lead agent + subagent patterns
- 🆕 Best Practices: Research→Plan→Implement, TDD, think tool
-
Building agents with the Claude Agent SDK (Sep 29, 2025) https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
-
Building effective agents (2025) https://www.anthropic.com/engineering/building-effective-agents
-
Effective context engineering for AI agents (Sep 29, 2025) https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
-
A postmortem of three recent issues (Sep 17, 2025) https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
-
Writing effective tools for agents — with agents (Sep 11, 2025) https://www.anthropic.com/engineering/writing-tools-for-agents
-
Desktop Extensions: One-click MCP server installation (Jun 26, 2025) https://www.anthropic.com/engineering/desktop-extensions
-
How we built our multi-agent research system (Jun 13, 2025) https://www.anthropic.com/engineering/multi-agent-research-system
-
Claude Code: Best practices for agentic coding (Apr 18, 2025) https://www.anthropic.com/engineering/claude-code-best-practices
-
The "think" tool: Enabling Claude to stop and think (Mar 20, 2025) https://www.anthropic.com/engineering/claude-think-tool
-
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet (Oct 2024) https://www.anthropic.com/engineering/swe-bench-sonnet
-
Introducing Contextual Retrieval (Sep 2024) https://www.anthropic.com/news/contextual-retrieval https://www.anthropic.com/engineering/contextual-retrieval
-
Revolutionary term: Approve "Agentic Substrate" or propose alternative?
-
Multi-agent economics: When should chief-architect spawn subagents? (15x cost threshold)
-
Think tool integration: Which agents need think tool capability? All or subset?
-
MCP packaging: Should we create .mcpb installer for one-click distribution?
-
Context engineering skill: Create as new skill or integrate into existing?
-
Quality bar: Use SWE-bench patterns for validation? Create our own benchmark?
-
CLAUDE.md defaults: What should default project CLAUDE.md contain?
-
Backward compatibility: How to migrate existing users to enhanced system?
ResearchPack Complete - Ready for Planning Phase
Confidence: HIGH - All information from official Anthropic engineering sources Coverage: COMPREHENSIVE - All 11 articles analyzed and synthesized Thematic Organization: COMPLETE - 7 major themes + patterns + implementation checklist