Skip to content

Khawaja-Murad/mas-tgr

Repository files navigation

Multi-Agent Reasoning System (MAS)

A hierarchical multi-agent reasoning system that tackles complex QA, math, and logic tasks using Templated Graph Reasoning (TGR), Retrieval-Augmented Generation (RAG), swarm consensus, code-backed research, and web evidence. The system supports both template-guided DAG execution (TGR fast-path) and a standard Supervisor–Worker–Verifier pipeline.


Table of Contents


High-Level Overview

MAS combines:

Layer Role
Supervisor Decomposes problems into subtasks, critiques worker outputs, synthesizes final answers, enforces question-type-aware output policies.
Swarm Workers Parallel multi-model ensemble (math, logic, QA) with cooperative reconciliation and early return on quorum.
Research Worker Code-first “Ouroboros” loop: generate Python → execute in sandbox → observe → refine (timeout-aware).
Verifier Independent numeric recomputation; returns a bare number when used.
Templated Graph Reasoning (TGR) Buffer-of-Thought templates + Graph-of-Thought controller: structured DAG execution with definition / enumeration / calculation / aggregation / verification (and optional retrieval) nodes.
RAG Hybrid fusion (semantic + BM25, RRF) over a LanceDB vector store; optional seed augmentation and mid-reasoning retrieval nodes.
Web Evidence Optional real-time search (DuckDuckGo) and page fetching; evidence injected into workers and synthesis, with grounding checks.

Entry point: solve_with_budget(problem, config_path, timeout_s, ...) in apps/mas/graph/plan_graph.py. It either runs the TGR fast-path (when a template matches with score ≥ 5) or the standard path (decompose → dispatch → critique → synthesize → verify).


Goals & Methodology

  • Solve complex, multi-step reasoning with verifiable outputs (numeric, boolean, multi-value, explanatory).
  • Reduce hallucinations via template-guided graphs (TGR), consensus, and verification.
  • Combine modalities: structured decomposition, multi-model consensus, code-backed experiments, RAG, and optional web evidence.
  • Question-type-aware synthesis: numeric → bare number; boolean → yes/no; multi-value → compact text or JSON when requested; explanatory → prose; factual → concise answer with optional citations.
  • Cold-start mitigation: TGR templates (and optional dynamic generation) provide domain-specific blueprints so the system does not start from scratch on math-heavy or procedural tasks.

System Architecture

Component hierarchy

apps/mas/
├── agents/                    # Worker agents and supervisor
│   ├── supervisor.py          # Decomposition, critique, synthesis, output policies
│   ├── websearch.py           # Web evidence (DuckDuckGo, extraction, grounding)
│   ├── swarm_worker.py        # Multi-model parallel consensus
│   ├── worker_math.py         # Math prompts
│   ├── worker_logic.py        # Logic prompts
│   ├── worker_qa.py           # QA prompts
│   ├── worker_researcher.py   # Code-first Ouroboros loop
│   ├── verifier.py            # Numeric verification
│   └── latent/                # Optional inter-agent hidden state (embedding, attention)
│
├── graph/                     # Orchestration and TGR
│   ├── plan_graph.py          # solve_with_budget(), standard path, parallel dispatch
│   ├── template_distiller.py  # Template selection (keyword + RAG + dynamic gen)
│   ├── template_generator.py  # LLM-based dynamic template creation
│   ├── got_controller.py      # Graph-of-Thought execution (TGR DAG)
│   ├── node_verifier.py       # Type-specific node output verification
│   ├── backtrack_manager.py   # Retry and state management for TGR
│   └── archetype_verifier.py  # Domain-specific answer clamping
│
├── rag/                       # Retrieval-Augmented Generation
│   ├── embeddings.py          # Codestral embedder (1536-dim)
│   ├── indexer.py             # Wikipedia → LanceDB ingestion
│   ├── retriever.py           # Hybrid fusion search (RRF)
│   ├── chunker.py             # Document chunking
│   └── evidence.py            # RAGEvidencePack, quality detection, query expansion
│
├── learning/                  # Optional distillation loop
│   ├── trace_recorder.py      # Execution trace capture
│   ├── trace_store.py         # Trace persistence
│   ├── pattern_analyzer.py    # Pattern extraction from traces
│   ├── prompt_enhancer.py     # Prompt augmentation with patterns
│   └── distillation_manager.py # Coordination
│
├── infra/                     # LLM and env
│   └── openrouter/client.py   # LLM API client (retries, optional caching)
│
├── tools/                     # Execution and web
│   ├── executor.py            # Sandboxed Python execution
│   ├── search.py              # DuckDuckGo web search
│   ├── fetch.py               # URL fetch, concurrent fetch, relevance extraction
│   └── timeline.py            # Timeline extraction and constraint solving
│
├── configs/                   # YAML config and TGR templates
│   ├── openrouter.yaml        # Models, swarm, TGR, RAG, parallel, caching
│   ├── learning.yaml          # Distillation, backtracking, latent
│   └── templates/*.json       # TGR template blueprints
│
├── benchmarks/                # Evaluation
│   ├── gsm8k.py, hotpotqa.py, drop.py, gpqa.py, bbh.py
│   └── ...
└── web/                       # UI
    └── chat_ui.py             # Gradio chat (with web toggle)

High-level pipeline diagram

flowchart TB
  subgraph input [Input]
    Q[User Query]
  end

  subgraph rag_layer [RAG & Web Layer]
    RAG[RAG Template Distiller / Seed Retrieval]
    WEB[Web Evidence Optional]
  end

  subgraph routing [Routing]
    TGR_CHECK{TGR enabled & template score ≥ 5?}
  end

  subgraph tgr_path [TGR Fast-Path]
    GOT[GoTController]
    NODES[Definition / Enum / Calc / Agg / Verify / Retrieval Nodes]
    GOT --> NODES
  end

  subgraph std_path [Standard Path]
    DEC[Supervisor.decompose]
    DISP[Dispatch: Swarm + ResearchWorker]
    CRIT[Supervisor.critique]
    SYN[Supervisor.synthesize]
    DEC --> DISP --> CRIT --> SYN
  end

  subgraph post [Post-Processing]
    VERIFY[Verifier for numeric]
    FMT[Question-type-aware formatting]
    VERIFY --> FMT
  end

  Q --> RAG
  Q --> WEB
  RAG --> TGR_CHECK
  TGR_CHECK -->|yes| tgr_path
  TGR_CHECK -->|no| std_path
  tgr_path --> FMT
  std_path --> post
  FMT --> OUT[Final Answer]
Loading

Standard path sequence (simplified)

sequenceDiagram
  participant User
  participant PG as plan_graph.solve_with_budget
  participant WS as WebSearchAgent
  participant Sup as SupervisorAgent
  participant Swarm as SwarmWorkerManager
  participant Res as ResearchWorker
  participant Ver as VerifierAgent

  User->>PG: problem
  PG->>WS: build_evidence(problem) [if web_enabled]
  WS-->>PG: WebEvidencePack

  PG->>Sup: decompose(problem)
  Sup-->>PG: Plan(SubTasks)

  loop For each (ready) SubTask
    alt role == research
      PG->>Res: run(instruction, context)
      Res-->>PG: result
    else role in qa/logic/math
      PG->>Swarm: run(instruction, role, context + web_evidence)
      Swarm-->>PG: responses[]
    end
  end

  PG->>Sup: critique(problem, results, web_evidence)
  Sup-->>PG: critique_text

  PG->>Sup: synthesize(problem, results, web_evidence)
  Sup-->>PG: final_answer

  opt Critique indicates issues
    PG->>Sup: resynthesize_with_critique(...)
  end

  opt Numeric question
    PG->>Ver: verify_numeric(problem, candidate, context)
    Ver-->>PG: verified_candidate?
  end

  PG-->>User: final answer
Loading

TGR (Graph-of-Thought) node flow

flowchart LR
  subgraph TGR [TGR DAG]
    N1[definition] --> N2[enumeration]
    N1 --> N3[calculation]
    N2 --> N4[aggregation]
    N3 --> N4
    N4 --> N5[verification]
  end

  N1 & N2 & N3 --> Swarm[SwarmWorker] or Res[ResearchWorker]
  N4 --> Swarm
  N5 --> Verifier[VerifierAgent]
Loading

Templates (e.g. hotel_toggle.json, spectral_cayley.json) define nodes (id, type, role, instruction) and edges. The GoTController topologically sorts nodes, runs same-level nodes in parallel where possible, and uses Swarm/ResearchWorker/Verifier by node type and role.

RA-TGR integration (RAG + TGR)

flowchart LR
  subgraph RA_TGR [RA-TGR]
    Q[Problem] --> TS[Template Selection + RAG]
    TS --> SEEDS[Augment knowledge_seeds with RAG]
    SEEDS --> GOT[GoTController]
    GOT --> RN[Retrieval nodes optional]
    RN --> GOT
  end
Loading
  • Template selection: RAG can boost template scores using retrieved context.
  • Seed augmentation: Knowledge seeds in the template can be augmented with RAG retrieval before DAG execution.
  • Retrieval nodes: Node type retrieval (or role rag) runs HybridRetriever during the graph and injects results into context.

ASCII overview (terminal-friendly)

User Query
    |
    v
+------------------------------------------+
|  RAG seed retrieval (optional)           |
|  Web evidence (optional)                  |
+------------------------------------------+
    |
    v
+------------------------------------------+
|  TGR? (template score >= 5)               |
+------------------------------------------+
    | yes                  | no
    v                     v
+-------------+    +----------------------------------+
| GoTController|    | Supervisor.decompose -> Plan      |
| (DAG nodes)  |    | Dispatch (Swarm + Research)     |
| -> final     |    | Supervisor.critique               |
+-------------+    | Supervisor.synthesize             |
    |              | [grounding check if web evidence]  |
    |              | Verifier (if numeric)              |
    |              +----------------------------------+
    |                     |
    +---------------------+
                |
                v
         Final Answer

Execution Flow

1. TGR fast-path (when enabled and template matches)

  1. Template selection: RAGTemplateDistiller (or TemplateDistiller) selects a template from configs/templates/ using keyword + optional RAG boost; optional dynamic generation if no match.
  2. Score threshold: Template is used only if score ≥ 5 (avoids misrouting factual QA to math templates).
  3. GoTController.run(): Template is hydrated into a DAG; knowledge seeds can be augmented with RAG; each node runs via Swarm or ResearchWorker; verification nodes call VerifierAgent.
  4. Early exit: If TGR returns a non-empty final_answer, it is returned and the standard path is skipped.

2. Standard path (decomposition → dispatch → critique → synthesis → verify)

  1. Decomposition: Supervisor builds a Plan of subtasks (roles: math, logic, qa, research) with optional dependencies. Numeric/simulation patterns can auto-inject math/research tasks.
  2. Web evidence (optional): If web_enabled, WebSearchAgent builds a WebEvidencePack (intent, extracted answer, sources); used as context for workers and synthesis, and for a later grounding check.
  3. Dispatch: Independent subtasks can run in parallel. Research subtasks → ResearchWorker (code execution). Others → SwarmWorkerManager (multi-model, cooperative rounds, early termination on quorum).
  4. Context: Workers receive dependency context, optional RAG evidence (fusion search), optional web evidence, and (when web-enabled) fetched Wikipedia pages from RAG URLs.
  5. Critique: Supervisor critiques worker outputs for consistency.
  6. Synthesis: Supervisor synthesizes the final answer; question type (numeric, boolean, multi_quantity, explanatory, factual) drives output policy. If critique indicates issues, resynthesize_with_critique; JSON repair can be rejected when not allowed.
  7. Grounding check: If web evidence contains an extracted answer, the final answer must contain it or a strict repair / deterministic fallback is applied.
  8. Verification: For single-number questions, VerifierAgent independently recomputes; candidate can be replaced if the verifier disagrees.

3. Time and resource budget

  • Overall: e.g. 300s default; configurable.
  • TGR: Per-node timeout (e.g. 90s), overall TGR timeout (e.g. 240s).
  • Standard path: Decomposition, per-subtask, synthesis, and verification each get a fraction of the remaining budget (see plan_graph.py).

Core Components (Low-Level)

Component File Purpose
SupervisorAgent agents/supervisor.py decompose(), critique(), synthesize(), resynthesize_with_critique(); question-type detection; output policies.
WebSearchAgent agents/websearch.py build_evidence(): intent detection, multi-hop queries, extraction, confidence; WebEvidencePack for workers and grounding.
SwarmWorkerManager agents/swarm_worker.py Parallel LLM calls, consensus, cooperative reconciliation, optional early termination when quorum agrees.
ResearchWorker agents/worker_researcher.py Ouroboros loop: generate code → execute (executor) → observe → refine; timeout-aware.
VerifierAgent agents/verifier.py verify_numeric(): independent low-temperature recomputation; returns bare number.
TemplateDistiller / RAGTemplateDistiller graph/template_distiller.py Keyword scoring + optional RAG boost + optional dynamic template generation.
GoTController graph/got_controller.py Load template → augment seeds (optional RAG) → topological execution of nodes (parallel by level) → Swarm/Research/Verifier per node.
NodeVerifier graph/node_verifier.py Type-specific checks on node outputs (definition, enumeration, calculation, aggregation, verification).
BacktrackManager graph/backtrack_manager.py Retry strategies and state management when node verification fails.
HybridRetriever rag/retriever.py Semantic + lexical search, RRF fusion.
CodestralEmbedder rag/embeddings.py Dense embeddings (e.g. 1536-dim) for RAG.
OpenRouterClient infra/openrouter/client.py LLM API with retries; optional response caching.

Data & Configuration

  • Config: apps/mas/configs/openrouter.yaml — model family, models, swarm (models, min responses, cooperative rounds), TGR (enabled, templates path, node/overall timeouts), RAG (enabled, db path, top_k, RRF weights, augment_seeds), parallel (concurrent subtasks/TGR nodes/fetches, early termination, speculative prefetch), caching.
  • Templates: apps/mas/configs/templates/*.json — template_id, domain_tags, description, knowledge_seeds, graph_blueprint (entrypoint, nodes, edges).
  • Learning: apps/mas/configs/learning.yaml — distillation, backtracking, latent communication (optional).
  • RAG store: LanceDB at rag_db_path (e.g. apps/mas/data/wiki_lance); ingestion via scripts (e.g. scripts/index_wikipedia.py).
  • Sandbox: Python executor with configurable timeout for ResearchWorker code.

Running the System

  1. Environment: Set OPENROUTER_API_KEY (or compatible) and ensure Python deps from requirements.txt are installed.
  2. Chat UI:
    python -m apps.mas.web.chat_ui --config apps/mas/configs/openrouter.yaml --server-name 127.0.0.1 --server-port 7860
    Use the web toggle to enable/disable web search and page fetching.
  3. Benchmarks:
    • Humanity’s Last Exam: python scripts/test_humanity_exam.py (respects config timeouts and TGR).
    • HotpotQA/GSM8K/etc.: run the corresponding module under apps/mas/benchmarks/ or scripts in apps/mas/scripts/.
  4. RAG indexing:
    python scripts/index_wikipedia.py --arrow-path <path> --max-docs 500 (see script and docs for options).

Concepts & Extensibility

  • Swarm consensus: Parallel LLM calls with reconciliation and optional early return on quorum.
  • Code-backed reasoning: Prefer executable simulation/enumeration (ResearchWorker) for brittle domains.
  • Verification: Independent numeric recomputation to catch drift.
  • Template-guided graphs: Buffer-of-Thought templates drive Graph-of-Thought execution to avoid cold starts and enforce domain structure.
  • RA-TGR: RAG augments template selection, seed augmentation, and optional retrieval nodes in the TGR DAG.
  • Timeout & budgeting: Per-node and overall budgets keep the system responsive.

Extensibility:

  • Add new templates under configs/templates/ (nodes, edges, seeds).
  • Improve the distiller (e.g. semantic or embedding-based retrieval for template selection).
  • Swap or add models in openrouter.yaml without changing core code.

For deeper technical detail, see docs/ARCHITECTURE.md and docs/SYSTEM_DOCUMENTATION.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages