Skip to content

AlexK020908/swarmQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

swarmQA

The load tester your AI assistant runs for you.

swarmQA is a scenario-driven load testing engine designed to be driven by AI coding assistants (Claude Code, Cursor, etc.) over MCP. You bring the product; your AI assistant authors the test scenario by reading your codebase; swarmQA spawns N concurrent virtual users that pound on it and produces a report your assistant can interpret.

Mental model:

Your codebase → AI assistant reads it → generates a Scenario
                                              ↓
                                     swarmQA runs N copies
                                              ↓
                                     Report → AI assistant explains it

The "intelligence" lives in your AI assistant (which already has full codebase context); swarmQA is the deterministic execution engine. No per-step LLM calls in the hot loop — zero API cost during the actual test run.

Install

Recommended: zero-install via uvx. swarmQA ships an MCP server that Claude Code (or any MCP client) can launch on demand — no manual pip install, no PATH setup. You just need uv once:

curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS / Linux
# or: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"   (Windows)

Then drop this into your project's .mcp.json:

{
  "mcpServers": {
    "swarmqa": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/YOUR_USER/swarmqa", "swarmqa-mcp"]
    }
  }
}

That's it. On first use, uvx clones the repo, installs deps into a cached venv, and runs swarmqa-mcp. Pin to a tagged release with git+https://github.com/YOUR_USER/swarmqa@v0.1.0.

Other install paths

pip install from git — installs swarmqa + swarmqa-mcp globally:

pip install git+https://github.com/YOUR_USER/swarmqa

Then .mcp.json is just { "mcpServers": { "swarmqa": { "command": "swarmqa-mcp" } } }.

Editable clone — for hacking on swarmQA itself:

git clone https://github.com/YOUR_USER/swarmqa
cd swarmqa
pip install -e ".[dev]"

Requires Python ≥ 3.11.

What's a scenario?

An ordered list of HTTP steps with templated values that resolve from per-agent state. State starts with {agent_id}; each step can extract response fields back into state for later steps to reference.

{
  "name": "todo_journey",
  "steps": [
    {
      "method": "POST",
      "path": "/signup",
      "body": { "email": "user{agent_id}@test.com", "password": "..." },
      "extract": { "token": "access_token" }
    },
    {
      "method": "POST",
      "path": "/todos",
      "headers": { "Authorization": "Bearer {token}" },
      "body": { "title": "todo for {agent_id}" },
      "extract": { "todo_id": "id" }
    },
    {
      "method": "GET",
      "path": "/todos/{todo_id}",
      "headers": { "Authorization": "Bearer {token}" }
    }
  ]
}

Each agent walks the scenario top-to-bottom in a loop. Auth, ordering, and stateful flows are all just steps in the scenario — no separate config layer.

Streaming step types

Most user-facing flows are one of three transports: HTTP request/response, Server-Sent Events, or WebSocket. swarmQA models all three as Steps.

SSE step — open the connection, hold it, expect events:

{
  "method": "GET",
  "path": "/events/stream",
  "headers": { "Authorization": "Bearer {token}" },
  "stream": {
    "type": "sse",
    "hold_for_s": 60,
    "expect_at_least": 1,
    "extract_from_first": { "session_id": "session_id" }
  }
}

For SSE steps, latency_ms is time-to-first-event (not connection latency). If fewer than expect_at_least events arrive within hold_for_s, the step records an error so capacity tests notice slow streams.

WebSocket step — open the socket, optionally push messages, expect server messages:

{
  "method": "GET",
  "path": "/ws",
  "headers": { "Authorization": "Bearer {token}" },
  "websocket": {
    "type": "websocket",
    "hold_for_s": 60,
    "expect_at_least": 1,
    "send": [
      { "json": { "type": "subscribe", "agent_id": "{agent_id}" } }
    ],
    "extract_from_first": { "session_id": "session_id" }
  }
}

The agent rewrites the base URL scheme automatically (http://ws://, https://wss://). Each entry in send may use either json or text, with optional delay_s for think time. latency_ms is time-to-first-server-message; falling short of expect_at_least is recorded as an error.

Using it from Claude Code

Once the MCP server is registered (see Install), three tools are available in any Claude Code session: stress, capacity, benchmark. You drive them in plain English:

"Boot the dev server, then run swarmqa stress against it with 100 users for 60s using scenarios/todo_journey.json. Tell me what broke."

Claude starts your server (npm run dev, uvicorn ..., whatever your stack uses), calls the stress tool, reads the returned Report (per-endpoint p50/p95/p99, error taxonomy, knee/cliff, bottleneck hints), and explains it inline. No copy-paste from a CLI.

Tool arguments mirror the CLI flags below.

Three commands (CLI escape hatch)

The CLI is the same surface, available for CI pipelines and direct shell use.

stress — classic load test

swarmqa stress \
  --scenario ./scenarios/todo_journey.json \
  --url http://localhost:8000 \
  --users 100 --duration 60 --ramp 10

capacity — "can this handle 20k users?"

swarmqa capacity \
  --scenario ./scenarios/todo_journey.json \
  --url http://localhost:8000 \
  --target-users 20000 --ramp 600 --duration 660

Prints PASS/FAIL with the knee + cliff points and a bottleneck rundown.

benchmark — A vs B

swarmqa benchmark \
  --scenario ./scenarios/todo_journey.json \
  --a http://localhost:3000  --b http://localhost:3001 \
  --users 200 --duration 120

Diffs p50/p95/p99, error rate, throughput, and resource usage between two running deployments.

Safety

swarmQA refuses to target any host whose name contains tokens like prod, production, api., www. by default. Override deliberately:

SWARMQA_ALLOW_PROD=1 swarmqa stress --url https://api.example.com ...

Limits (v0)

  • Transports covered: HTTP request/response, Server-Sent Events, WebSocket. Anything outside the request path users actually hit (background workers, indexers, queues) is out of scope — mock at the API/SSE/WS boundary.
  • Single-box swarm caps around ~5–20k virtual users (distributed runner is v2).
  • External deps (Stripe, SendGrid) should be mocked to avoid rate-limit skew.

Releases

No releases published

Packages

 
 
 

Contributors

Languages