Multi-Agent Reasoning System (MAS)

A hierarchical multi-agent reasoning system that tackles complex QA, math, and logic tasks using Templated Graph Reasoning (TGR), Retrieval-Augmented Generation (RAG), swarm consensus, code-backed research, and web evidence. The system supports both template-guided DAG execution (TGR fast-path) and a standard Supervisor–Worker–Verifier pipeline.

High-Level Overview

MAS combines:

Layer	Role
Supervisor	Decomposes problems into subtasks, critiques worker outputs, synthesizes final answers, enforces question-type-aware output policies.
Swarm Workers	Parallel multi-model ensemble (math, logic, QA) with cooperative reconciliation and early return on quorum.
Research Worker	Code-first “Ouroboros” loop: generate Python → execute in sandbox → observe → refine (timeout-aware).
Verifier	Independent numeric recomputation; returns a bare number when used.
Templated Graph Reasoning (TGR)	Buffer-of-Thought templates + Graph-of-Thought controller: structured DAG execution with definition / enumeration / calculation / aggregation / verification (and optional retrieval) nodes.
RAG	Hybrid fusion (semantic + BM25, RRF) over a LanceDB vector store; optional seed augmentation and mid-reasoning retrieval nodes.
Web Evidence	Optional real-time search (DuckDuckGo) and page fetching; evidence injected into workers and synthesis, with grounding checks.

Entry point: solve_with_budget(problem, config_path, timeout_s, ...) in apps/mas/graph/plan_graph.py. It either runs the TGR fast-path (when a template matches with score ≥ 5) or the standard path (decompose → dispatch → critique → synthesize → verify).

Goals & Methodology

Solve complex, multi-step reasoning with verifiable outputs (numeric, boolean, multi-value, explanatory).
Reduce hallucinations via template-guided graphs (TGR), consensus, and verification.
Combine modalities: structured decomposition, multi-model consensus, code-backed experiments, RAG, and optional web evidence.
Question-type-aware synthesis: numeric → bare number; boolean → yes/no; multi-value → compact text or JSON when requested; explanatory → prose; factual → concise answer with optional citations.
Cold-start mitigation: TGR templates (and optional dynamic generation) provide domain-specific blueprints so the system does not start from scratch on math-heavy or procedural tasks.

System Architecture

Component hierarchy

apps/mas/
├── agents/                    # Worker agents and supervisor
│   ├── supervisor.py          # Decomposition, critique, synthesis, output policies
│   ├── websearch.py           # Web evidence (DuckDuckGo, extraction, grounding)
│   ├── swarm_worker.py        # Multi-model parallel consensus
│   ├── worker_math.py         # Math prompts
│   ├── worker_logic.py        # Logic prompts
│   ├── worker_qa.py           # QA prompts
│   ├── worker_researcher.py   # Code-first Ouroboros loop
│   ├── verifier.py            # Numeric verification
│   └── latent/                # Optional inter-agent hidden state (embedding, attention)
│
├── graph/                     # Orchestration and TGR
│   ├── plan_graph.py          # solve_with_budget(), standard path, parallel dispatch
│   ├── template_distiller.py  # Template selection (keyword + RAG + dynamic gen)
│   ├── template_generator.py  # LLM-based dynamic template creation
│   ├── got_controller.py      # Graph-of-Thought execution (TGR DAG)
│   ├── node_verifier.py       # Type-specific node output verification
│   ├── backtrack_manager.py   # Retry and state management for TGR
│   └── archetype_verifier.py  # Domain-specific answer clamping
│
├── rag/                       # Retrieval-Augmented Generation
│   ├── embeddings.py          # Codestral embedder (1536-dim)
│   ├── indexer.py             # Wikipedia → LanceDB ingestion
│   ├── retriever.py           # Hybrid fusion search (RRF)
│   ├── chunker.py             # Document chunking
│   └── evidence.py            # RAGEvidencePack, quality detection, query expansion
│
├── learning/                  # Optional distillation loop
│   ├── trace_recorder.py      # Execution trace capture
│   ├── trace_store.py         # Trace persistence
│   ├── pattern_analyzer.py    # Pattern extraction from traces
│   ├── prompt_enhancer.py     # Prompt augmentation with patterns
│   └── distillation_manager.py # Coordination
│
├── infra/                     # LLM and env
│   └── openrouter/client.py   # LLM API client (retries, optional caching)
│
├── tools/                     # Execution and web
│   ├── executor.py            # Sandboxed Python execution
│   ├── search.py              # DuckDuckGo web search
│   ├── fetch.py               # URL fetch, concurrent fetch, relevance extraction
│   └── timeline.py            # Timeline extraction and constraint solving
│
├── configs/                   # YAML config and TGR templates
│   ├── openrouter.yaml        # Models, swarm, TGR, RAG, parallel, caching
│   ├── learning.yaml          # Distillation, backtracking, latent
│   └── templates/*.json       # TGR template blueprints
│
├── benchmarks/                # Evaluation
│   ├── gsm8k.py, hotpotqa.py, drop.py, gpqa.py, bbh.py
│   └── ...
└── web/                       # UI
    └── chat_ui.py             # Gradio chat (with web toggle)

High-level pipeline diagram

flowchart TB
  subgraph input [Input]
    Q[User Query]
  end

  subgraph rag_layer [RAG & Web Layer]
    RAG[RAG Template Distiller / Seed Retrieval]
    WEB[Web Evidence Optional]
  end

  subgraph routing [Routing]
    TGR_CHECK{TGR enabled & template score ≥ 5?}
  end

  subgraph tgr_path [TGR Fast-Path]
    GOT[GoTController]
    NODES[Definition / Enum / Calc / Agg / Verify / Retrieval Nodes]
    GOT --> NODES
  end

  subgraph std_path [Standard Path]
    DEC[Supervisor.decompose]
    DISP[Dispatch: Swarm + ResearchWorker]
    CRIT[Supervisor.critique]
    SYN[Supervisor.synthesize]
    DEC --> DISP --> CRIT --> SYN
  end

  subgraph post [Post-Processing]
    VERIFY[Verifier for numeric]
    FMT[Question-type-aware formatting]
    VERIFY --> FMT
  end

  Q --> RAG
  Q --> WEB
  RAG --> TGR_CHECK
  TGR_CHECK -->|yes| tgr_path
  TGR_CHECK -->|no| std_path
  tgr_path --> FMT
  std_path --> post
  FMT --> OUT[Final Answer]

Standard path sequence (simplified)

sequenceDiagram
  participant User
  participant PG as plan_graph.solve_with_budget
  participant WS as WebSearchAgent
  participant Sup as SupervisorAgent
  participant Swarm as SwarmWorkerManager
  participant Res as ResearchWorker
  participant Ver as VerifierAgent

  User->>PG: problem
  PG->>WS: build_evidence(problem) [if web_enabled]
  WS-->>PG: WebEvidencePack

  PG->>Sup: decompose(problem)
  Sup-->>PG: Plan(SubTasks)

  loop For each (ready) SubTask
    alt role == research
      PG->>Res: run(instruction, context)
      Res-->>PG: result
    else role in qa/logic/math
      PG->>Swarm: run(instruction, role, context + web_evidence)
      Swarm-->>PG: responses[]
    end
  end

  PG->>Sup: critique(problem, results, web_evidence)
  Sup-->>PG: critique_text

  PG->>Sup: synthesize(problem, results, web_evidence)
  Sup-->>PG: final_answer

  opt Critique indicates issues
    PG->>Sup: resynthesize_with_critique(...)
  end

  opt Numeric question
    PG->>Ver: verify_numeric(problem, candidate, context)
    Ver-->>PG: verified_candidate?
  end

  PG-->>User: final answer

TGR (Graph-of-Thought) node flow

flowchart LR
  subgraph TGR [TGR DAG]
    N1[definition] --> N2[enumeration]
    N1 --> N3[calculation]
    N2 --> N4[aggregation]
    N3 --> N4
    N4 --> N5[verification]
  end

  N1 & N2 & N3 --> Swarm[SwarmWorker] or Res[ResearchWorker]
  N4 --> Swarm
  N5 --> Verifier[VerifierAgent]

Templates (e.g. hotel_toggle.json, spectral_cayley.json) define nodes (id, type, role, instruction) and edges. The GoTController topologically sorts nodes, runs same-level nodes in parallel where possible, and uses Swarm/ResearchWorker/Verifier by node type and role.

RA-TGR integration (RAG + TGR)

flowchart LR
  subgraph RA_TGR [RA-TGR]
    Q[Problem] --> TS[Template Selection + RAG]
    TS --> SEEDS[Augment knowledge_seeds with RAG]
    SEEDS --> GOT[GoTController]
    GOT --> RN[Retrieval nodes optional]
    RN --> GOT
  end

Template selection: RAG can boost template scores using retrieved context.
Seed augmentation: Knowledge seeds in the template can be augmented with RAG retrieval before DAG execution.
Retrieval nodes: Node type retrieval (or role rag) runs HybridRetriever during the graph and injects results into context.

ASCII overview (terminal-friendly)

User Query
    |
    v
+------------------------------------------+
|  RAG seed retrieval (optional)           |
|  Web evidence (optional)                  |
+------------------------------------------+
    |
    v
+------------------------------------------+
|  TGR? (template score >= 5)               |
+------------------------------------------+
    | yes                  | no
    v                     v
+-------------+    +----------------------------------+
| GoTController|    | Supervisor.decompose -> Plan      |
| (DAG nodes)  |    | Dispatch (Swarm + Research)     |
| -> final     |    | Supervisor.critique               |
+-------------+    | Supervisor.synthesize             |
    |              | [grounding check if web evidence]  |
    |              | Verifier (if numeric)              |
    |              +----------------------------------+
    |                     |
    +---------------------+
                |
                v
         Final Answer

Execution Flow

1. TGR fast-path (when enabled and template matches)

Template selection: RAGTemplateDistiller (or TemplateDistiller) selects a template from configs/templates/ using keyword + optional RAG boost; optional dynamic generation if no match.
Score threshold: Template is used only if score ≥ 5 (avoids misrouting factual QA to math templates).
GoTController.run(): Template is hydrated into a DAG; knowledge seeds can be augmented with RAG; each node runs via Swarm or ResearchWorker; verification nodes call VerifierAgent.
Early exit: If TGR returns a non-empty final_answer, it is returned and the standard path is skipped.

2. Standard path (decomposition → dispatch → critique → synthesis → verify)

Decomposition: Supervisor builds a Plan of subtasks (roles: math, logic, qa, research) with optional dependencies. Numeric/simulation patterns can auto-inject math/research tasks.
Web evidence (optional): If web_enabled, WebSearchAgent builds a WebEvidencePack (intent, extracted answer, sources); used as context for workers and synthesis, and for a later grounding check.
Dispatch: Independent subtasks can run in parallel. Research subtasks → ResearchWorker (code execution). Others → SwarmWorkerManager (multi-model, cooperative rounds, early termination on quorum).
Context: Workers receive dependency context, optional RAG evidence (fusion search), optional web evidence, and (when web-enabled) fetched Wikipedia pages from RAG URLs.
Critique: Supervisor critiques worker outputs for consistency.
Synthesis: Supervisor synthesizes the final answer; question type (numeric, boolean, multi_quantity, explanatory, factual) drives output policy. If critique indicates issues, resynthesize_with_critique; JSON repair can be rejected when not allowed.
Grounding check: If web evidence contains an extracted answer, the final answer must contain it or a strict repair / deterministic fallback is applied.
Verification: For single-number questions, VerifierAgent independently recomputes; candidate can be replaced if the verifier disagrees.

3. Time and resource budget

Overall: e.g. 300s default; configurable.
TGR: Per-node timeout (e.g. 90s), overall TGR timeout (e.g. 240s).
Standard path: Decomposition, per-subtask, synthesis, and verification each get a fraction of the remaining budget (see plan_graph.py).

Core Components (Low-Level)

Component	File	Purpose
SupervisorAgent	`agents/supervisor.py`	`decompose()`, `critique()`, `synthesize()`, `resynthesize_with_critique()`; question-type detection; output policies.
WebSearchAgent	`agents/websearch.py`	`build_evidence()`: intent detection, multi-hop queries, extraction, confidence; WebEvidencePack for workers and grounding.
SwarmWorkerManager	`agents/swarm_worker.py`	Parallel LLM calls, consensus, cooperative reconciliation, optional early termination when quorum agrees.
ResearchWorker	`agents/worker_researcher.py`	Ouroboros loop: generate code → execute (executor) → observe → refine; timeout-aware.
VerifierAgent	`agents/verifier.py`	`verify_numeric()`: independent low-temperature recomputation; returns bare number.
TemplateDistiller / RAGTemplateDistiller	`graph/template_distiller.py`	Keyword scoring + optional RAG boost + optional dynamic template generation.
GoTController	`graph/got_controller.py`	Load template → augment seeds (optional RAG) → topological execution of nodes (parallel by level) → Swarm/Research/Verifier per node.
NodeVerifier	`graph/node_verifier.py`	Type-specific checks on node outputs (definition, enumeration, calculation, aggregation, verification).
BacktrackManager	`graph/backtrack_manager.py`	Retry strategies and state management when node verification fails.
HybridRetriever	`rag/retriever.py`	Semantic + lexical search, RRF fusion.
CodestralEmbedder	`rag/embeddings.py`	Dense embeddings (e.g. 1536-dim) for RAG.
OpenRouterClient	`infra/openrouter/client.py`	LLM API with retries; optional response caching.

Data & Configuration

Config: apps/mas/configs/openrouter.yaml — model family, models, swarm (models, min responses, cooperative rounds), TGR (enabled, templates path, node/overall timeouts), RAG (enabled, db path, top_k, RRF weights, augment_seeds), parallel (concurrent subtasks/TGR nodes/fetches, early termination, speculative prefetch), caching.
Templates: apps/mas/configs/templates/*.json — template_id, domain_tags, description, knowledge_seeds, graph_blueprint (entrypoint, nodes, edges).
Learning: apps/mas/configs/learning.yaml — distillation, backtracking, latent communication (optional).
RAG store: LanceDB at rag_db_path (e.g. apps/mas/data/wiki_lance); ingestion via scripts (e.g. scripts/index_wikipedia.py).
Sandbox: Python executor with configurable timeout for ResearchWorker code.

Running the System

Environment: Set OPENROUTER_API_KEY (or compatible) and ensure Python deps from requirements.txt are installed.
Chat UI:
python -m apps.mas.web.chat_ui --config apps/mas/configs/openrouter.yaml --server-name 127.0.0.1 --server-port 7860
Use the web toggle to enable/disable web search and page fetching.
Benchmarks:
- Humanity’s Last Exam: python scripts/test_humanity_exam.py (respects config timeouts and TGR).
- HotpotQA/GSM8K/etc.: run the corresponding module under apps/mas/benchmarks/ or scripts in apps/mas/scripts/.
RAG indexing:
python scripts/index_wikipedia.py --arrow-path <path> --max-docs 500 (see script and docs for options).

Concepts & Extensibility

Swarm consensus: Parallel LLM calls with reconciliation and optional early return on quorum.
Code-backed reasoning: Prefer executable simulation/enumeration (ResearchWorker) for brittle domains.
Verification: Independent numeric recomputation to catch drift.
Template-guided graphs: Buffer-of-Thought templates drive Graph-of-Thought execution to avoid cold starts and enforce domain structure.
RA-TGR: RAG augments template selection, seed augmentation, and optional retrieval nodes in the TGR DAG.
Timeout & budgeting: Per-node and overall budgets keep the system responsive.

Extensibility:

Add new templates under configs/templates/ (nodes, edges, seeds).
Improve the distiller (e.g. semantic or embedding-based retrieval for template selection).
Swap or add models in openrouter.yaml without changing core code.

For deeper technical detail, see docs/ARCHITECTURE.md and docs/SYSTEM_DOCUMENTATION.md.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
apps		apps
docs		docs
reports		reports
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Scalable and Hierarchical Multi-Agent Reasoning System with Latent Communication.pdf		Scalable and Hierarchical Multi-Agent Reasoning System with Latent Communication.pdf
commands.txt		commands.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Reasoning System (MAS)

Table of Contents

High-Level Overview

Goals & Methodology

System Architecture

Component hierarchy

High-level pipeline diagram

Standard path sequence (simplified)

TGR (Graph-of-Thought) node flow

RA-TGR integration (RAG + TGR)

ASCII overview (terminal-friendly)

Execution Flow

1. TGR fast-path (when enabled and template matches)

2. Standard path (decomposition → dispatch → critique → synthesis → verify)

3. Time and resource budget

Core Components (Low-Level)

Data & Configuration

Running the System

Concepts & Extensibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Reasoning System (MAS)

Table of Contents

High-Level Overview

Goals & Methodology

System Architecture

Component hierarchy

High-level pipeline diagram

Standard path sequence (simplified)

TGR (Graph-of-Thought) node flow

RA-TGR integration (RAG + TGR)

ASCII overview (terminal-friendly)

Execution Flow

1. TGR fast-path (when enabled and template matches)

2. Standard path (decomposition → dispatch → critique → synthesis → verify)

3. Time and resource budget

Core Components (Low-Level)

Data & Configuration

Running the System

Concepts & Extensibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages