The load tester your AI assistant runs for you.
swarmQA is a scenario-driven load testing engine designed to be driven by AI coding assistants (Claude Code, Cursor, etc.) over MCP. You bring the product; your AI assistant authors the test scenario by reading your codebase; swarmQA spawns N concurrent virtual users that pound on it and produces a report your assistant can interpret.
Mental model:
Your codebase → AI assistant reads it → generates a Scenario
↓
swarmQA runs N copies
↓
Report → AI assistant explains it
The "intelligence" lives in your AI assistant (which already has full codebase context); swarmQA is the deterministic execution engine. No per-step LLM calls in the hot loop — zero API cost during the actual test run.
Recommended: zero-install via uvx. swarmQA ships an MCP server that Claude Code (or any MCP client) can launch on demand — no manual pip install, no PATH setup. You just need uv once:
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
# or: powershell -c "irm https://astral.sh/uv/install.ps1 | iex" (Windows)Then drop this into your project's .mcp.json:
{
"mcpServers": {
"swarmqa": {
"command": "uvx",
"args": ["--from", "git+https://github.com/YOUR_USER/swarmqa", "swarmqa-mcp"]
}
}
}That's it. On first use, uvx clones the repo, installs deps into a cached venv, and runs swarmqa-mcp. Pin to a tagged release with git+https://github.com/YOUR_USER/swarmqa@v0.1.0.
Other install paths
pip install from git — installs swarmqa + swarmqa-mcp globally:
pip install git+https://github.com/YOUR_USER/swarmqaThen .mcp.json is just { "mcpServers": { "swarmqa": { "command": "swarmqa-mcp" } } }.
Editable clone — for hacking on swarmQA itself:
git clone https://github.com/YOUR_USER/swarmqa
cd swarmqa
pip install -e ".[dev]"Requires Python ≥ 3.11.
An ordered list of HTTP steps with templated values that resolve from per-agent state. State starts with {agent_id}; each step can extract response fields back into state for later steps to reference.
{
"name": "todo_journey",
"steps": [
{
"method": "POST",
"path": "/signup",
"body": { "email": "user{agent_id}@test.com", "password": "..." },
"extract": { "token": "access_token" }
},
{
"method": "POST",
"path": "/todos",
"headers": { "Authorization": "Bearer {token}" },
"body": { "title": "todo for {agent_id}" },
"extract": { "todo_id": "id" }
},
{
"method": "GET",
"path": "/todos/{todo_id}",
"headers": { "Authorization": "Bearer {token}" }
}
]
}Each agent walks the scenario top-to-bottom in a loop. Auth, ordering, and stateful flows are all just steps in the scenario — no separate config layer.
Most user-facing flows are one of three transports: HTTP request/response, Server-Sent Events, or WebSocket. swarmQA models all three as Steps.
SSE step — open the connection, hold it, expect events:
{
"method": "GET",
"path": "/events/stream",
"headers": { "Authorization": "Bearer {token}" },
"stream": {
"type": "sse",
"hold_for_s": 60,
"expect_at_least": 1,
"extract_from_first": { "session_id": "session_id" }
}
}For SSE steps, latency_ms is time-to-first-event (not connection latency). If fewer than expect_at_least events arrive within hold_for_s, the step records an error so capacity tests notice slow streams.
WebSocket step — open the socket, optionally push messages, expect server messages:
{
"method": "GET",
"path": "/ws",
"headers": { "Authorization": "Bearer {token}" },
"websocket": {
"type": "websocket",
"hold_for_s": 60,
"expect_at_least": 1,
"send": [
{ "json": { "type": "subscribe", "agent_id": "{agent_id}" } }
],
"extract_from_first": { "session_id": "session_id" }
}
}The agent rewrites the base URL scheme automatically (http:// → ws://, https:// → wss://). Each entry in send may use either json or text, with optional delay_s for think time. latency_ms is time-to-first-server-message; falling short of expect_at_least is recorded as an error.
Once the MCP server is registered (see Install), three tools are available in any Claude Code session: stress, capacity, benchmark. You drive them in plain English:
"Boot the dev server, then run swarmqa stress against it with 100 users for 60s using
scenarios/todo_journey.json. Tell me what broke."
Claude starts your server (npm run dev, uvicorn ..., whatever your stack uses), calls the stress tool, reads the returned Report (per-endpoint p50/p95/p99, error taxonomy, knee/cliff, bottleneck hints), and explains it inline. No copy-paste from a CLI.
Tool arguments mirror the CLI flags below.
The CLI is the same surface, available for CI pipelines and direct shell use.
swarmqa stress \
--scenario ./scenarios/todo_journey.json \
--url http://localhost:8000 \
--users 100 --duration 60 --ramp 10swarmqa capacity \
--scenario ./scenarios/todo_journey.json \
--url http://localhost:8000 \
--target-users 20000 --ramp 600 --duration 660Prints PASS/FAIL with the knee + cliff points and a bottleneck rundown.
swarmqa benchmark \
--scenario ./scenarios/todo_journey.json \
--a http://localhost:3000 --b http://localhost:3001 \
--users 200 --duration 120Diffs p50/p95/p99, error rate, throughput, and resource usage between two running deployments.
swarmQA refuses to target any host whose name contains tokens like prod, production, api., www. by default. Override deliberately:
SWARMQA_ALLOW_PROD=1 swarmqa stress --url https://api.example.com ...- Transports covered: HTTP request/response, Server-Sent Events, WebSocket. Anything outside the request path users actually hit (background workers, indexers, queues) is out of scope — mock at the API/SSE/WS boundary.
- Single-box swarm caps around ~5–20k virtual users (distributed runner is v2).
- External deps (Stripe, SendGrid) should be mocked to avoid rate-limit skew.