swarmQA

The load tester your AI assistant runs for you.

swarmQA is a scenario-driven load testing engine designed to be driven by AI coding assistants (Claude Code, Cursor, etc.) over MCP. You bring the product; your AI assistant authors the test scenario by reading your codebase; swarmQA spawns N concurrent virtual users that pound on it and produces a report your assistant can interpret.

Mental model:

Your codebase → AI assistant reads it → generates a Scenario
                                              ↓
                                     swarmQA runs N copies
                                              ↓
                                     Report → AI assistant explains it

The "intelligence" lives in your AI assistant (which already has full codebase context); swarmQA is the deterministic execution engine. No per-step LLM calls in the hot loop — zero API cost during the actual test run.

Install

Recommended: zero-install via uvx. swarmQA ships an MCP server that Claude Code (or any MCP client) can launch on demand — no manual pip install, no PATH setup. You just need uv once:

curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS / Linux
# or: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"   (Windows)

Then drop this into your project's .mcp.json:

{
  "mcpServers": {
    "swarmqa": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/YOUR_USER/swarmqa", "swarmqa-mcp"]
    }
  }
}

That's it. On first use, uvx clones the repo, installs deps into a cached venv, and runs swarmqa-mcp. Pin to a tagged release with git+https://github.com/YOUR_USER/swarmqa@v0.1.0.

Other install paths

pip install from git — installs swarmqa + swarmqa-mcp globally:

pip install git+https://github.com/YOUR_USER/swarmqa

Then .mcp.json is just { "mcpServers": { "swarmqa": { "command": "swarmqa-mcp" } } }.

Editable clone — for hacking on swarmQA itself:

git clone https://github.com/YOUR_USER/swarmqa
cd swarmqa
pip install -e ".[dev]"

Requires Python ≥ 3.11.

What's a scenario?

An ordered list of HTTP steps with templated values that resolve from per-agent state. State starts with {agent_id}; each step can extract response fields back into state for later steps to reference.

{
  "name": "todo_journey",
  "steps": [
    {
      "method": "POST",
      "path": "/signup",
      "body": { "email": "user{agent_id}@test.com", "password": "..." },
      "extract": { "token": "access_token" }
    },
    {
      "method": "POST",
      "path": "/todos",
      "headers": { "Authorization": "Bearer {token}" },
      "body": { "title": "todo for {agent_id}" },
      "extract": { "todo_id": "id" }
    },
    {
      "method": "GET",
      "path": "/todos/{todo_id}",
      "headers": { "Authorization": "Bearer {token}" }
    }
  ]
}

Each agent walks the scenario top-to-bottom in a loop. Auth, ordering, and stateful flows are all just steps in the scenario — no separate config layer.

Streaming step types

Most user-facing flows are one of three transports: HTTP request/response, Server-Sent Events, or WebSocket. swarmQA models all three as Steps.

SSE step — open the connection, hold it, expect events:

{
  "method": "GET",
  "path": "/events/stream",
  "headers": { "Authorization": "Bearer {token}" },
  "stream": {
    "type": "sse",
    "hold_for_s": 60,
    "expect_at_least": 1,
    "extract_from_first": { "session_id": "session_id" }
  }
}

For SSE steps, latency_ms is time-to-first-event (not connection latency). If fewer than expect_at_least events arrive within hold_for_s, the step records an error so capacity tests notice slow streams.

WebSocket step — open the socket, optionally push messages, expect server messages:

{
  "method": "GET",
  "path": "/ws",
  "headers": { "Authorization": "Bearer {token}" },
  "websocket": {
    "type": "websocket",
    "hold_for_s": 60,
    "expect_at_least": 1,
    "send": [
      { "json": { "type": "subscribe", "agent_id": "{agent_id}" } }
    ],
    "extract_from_first": { "session_id": "session_id" }
  }
}

The agent rewrites the base URL scheme automatically (http:// → ws://, https:// → wss://). Each entry in send may use either json or text, with optional delay_s for think time. latency_ms is time-to-first-server-message; falling short of expect_at_least is recorded as an error.

Using it from Claude Code

Once the MCP server is registered (see Install), three tools are available in any Claude Code session: stress, capacity, benchmark. You drive them in plain English:

"Boot the dev server, then run swarmqa stress against it with 100 users for 60s using scenarios/todo_journey.json. Tell me what broke."

Claude starts your server (npm run dev, uvicorn ..., whatever your stack uses), calls the stress tool, reads the returned Report (per-endpoint p50/p95/p99, error taxonomy, knee/cliff, bottleneck hints), and explains it inline. No copy-paste from a CLI.

Tool arguments mirror the CLI flags below.

Three commands (CLI escape hatch)

The CLI is the same surface, available for CI pipelines and direct shell use.

`stress` — classic load test

swarmqa stress \
  --scenario ./scenarios/todo_journey.json \
  --url http://localhost:8000 \
  --users 100 --duration 60 --ramp 10

`capacity` — "can this handle 20k users?"

swarmqa capacity \
  --scenario ./scenarios/todo_journey.json \
  --url http://localhost:8000 \
  --target-users 20000 --ramp 600 --duration 660

Prints PASS/FAIL with the knee + cliff points and a bottleneck rundown.

`benchmark` — A vs B

swarmqa benchmark \
  --scenario ./scenarios/todo_journey.json \
  --a http://localhost:3000  --b http://localhost:3001 \
  --users 200 --duration 120

Diffs p50/p95/p99, error rate, throughput, and resource usage between two running deployments.

Safety

swarmQA refuses to target any host whose name contains tokens like prod, production, api., www. by default. Override deliberately:

SWARMQA_ALLOW_PROD=1 swarmqa stress --url https://api.example.com ...

Limits (v0)

Transports covered: HTTP request/response, Server-Sent Events, WebSocket. Anything outside the request path users actually hit (background workers, indexers, queues) is out of scope — mock at the API/SSE/WS boundary.
Single-box swarm caps around ~5–20k virtual users (distributed runner is v2).
External deps (Stripe, SendGrid) should be mocked to avoid rate-limit skew.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
swarmqa		swarmqa
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

swarmQA

Install

What's a scenario?

Streaming step types

Using it from Claude Code

Three commands (CLI escape hatch)

`stress` — classic load test

`capacity` — "can this handle 20k users?"

`benchmark` — A vs B

Safety

Limits (v0)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

swarmQA

Install

What's a scenario?

Streaming step types

Using it from Claude Code

Three commands (CLI escape hatch)

stress — classic load test

capacity — "can this handle 20k users?"

benchmark — A vs B

Safety

Limits (v0)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`stress` — classic load test

`capacity` — "can this handle 20k users?"

`benchmark` — A vs B

Packages