Skip to content

Commit 621efe3

Browse files
ajcasagrandeclaude
andcommitted
docs(anthropic): document raw_system, ISL accounting, and DAG/FORK replay
Updates the Anthropic Messages endpoint tutorial to cover the dag5 BaseEndpoint hooks and Turn fields adopted by AnthropicMessagesEndpoint: - New "Advanced features" section documenting Turn.raw_system for list-form system blocks (per-block cache_control, latest-non-None resolution) and the audio/video NotImplementedError contract. - New "Internals" section with a table of the extract_payload_inputs walks (walk_system / walk_tool_schemas / walk_tool_blocks) and a mermaid diagram of build_assistant_turn streaming reassembly (thinking_delta + signature_delta per index, input_json_delta for tool_use, fallback to base when neither is present). - Corrected the existing audio/video tip - it claimed warn-and-drop; the actual behaviour is NotImplementedError at format time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
1 parent ef087e1 commit 621efe3

1 file changed

Lines changed: 91 additions & 2 deletions

File tree

docs/tutorials/anthropic-messages-endpoint.md

Lines changed: 91 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ Key formatting rules:
117117
- **Stream**: Set from `--streaming` flag (default: `false`)
118118
- **System**: Placed as a top-level string field when a system message is provided (never in the messages array)
119119
- **Tools**: Included at the top level when tool definitions are present in the conversation
120+
- **System (raw)**: When a turn carries `raw_system` (a list of content blocks), it is placed verbatim in the payload's top-level `system` field, overriding the conversation-level `system_message` string. See [Per-block system prompts (`raw_system`)](#per-block-system-prompts-raw_system)
120121
- **Extra inputs**: Merged into the payload via `--extra-inputs` (e.g., `--extra-inputs temperature:0.7`)
121122

122123
### Simple Text Content
@@ -339,10 +340,98 @@ The usage object may include `cache_creation_input_tokens` and `cache_read_input
339340

340341
---
341342

342-
## Tips
343+
## Advanced features
344+
345+
### Per-block system prompts (`raw_system`)
346+
347+
The conversation-level `system_message` string (set, for example, by `--shared-system-prompt-length`) is rendered as a plain top-level `system: "..."` string. That works for most cases, but cannot express Anthropic's per-block extensions — most importantly `cache_control: {"type": "ephemeral"}` for prompt caching.
348+
349+
To opt in, set `raw_system` on a `Turn` to a list of content blocks. The endpoint resolves it the same way as `raw_tools`: walking the turn list from the end and taking the first non-`None` value (`base_endpoint.py::_latest_turn_attr`). When set, it wins over `system_message`:
350+
351+
```json
352+
{
353+
"model": "claude-sonnet-4-20250514",
354+
"system": [
355+
{"type": "text", "text": "You are a careful code reviewer."},
356+
{
357+
"type": "text",
358+
"text": "<long shared rubric...>",
359+
"cache_control": {"type": "ephemeral"}
360+
}
361+
],
362+
"messages": [...],
363+
"max_tokens": 1024,
364+
"stream": false
365+
}
366+
```
367+
368+
Notes:
369+
370+
- Latest-non-`None` turn wins — a FORK-mode DAG child that does not re-declare `raw_system` still inherits the parent's value.
371+
- The field is Anthropic-specific. Other endpoints (`chat`, `responses`, ...) ignore it.
372+
- Block contents flow into ISL accounting via the `walk_system` override; see [Input token accounting (ISL)](#input-token-accounting-isl) below.
373+
- `raw_system` is currently populated by trace-replay loaders that ingest Anthropic-shaped traces; programmatic callers building `Turn` objects directly can set it as well.
374+
375+
### Audio and video are unsupported
376+
377+
The Anthropic Messages API does not accept audio or video content blocks. `_render_audio_part` and `_render_video_part` raise `NotImplementedError` immediately so a misuse fails at `format_payload` time with a clear error rather than producing an opaque server 4xx after dispatch:
378+
379+
```text
380+
NotImplementedError: Anthropic Messages API does not support audio input.
381+
Use a different endpoint, or remove audio content from the turn.
382+
```
383+
384+
This is intentional. If your dataset has audio or video, route it to an endpoint that supports it (for example, the OpenAI `chat` endpoint with a multimodal-capable model) — do not try to coerce it into `anthropic_messages`.
385+
386+
---
387+
388+
## Internals
389+
390+
### Input token accounting (ISL)
391+
392+
When you set `--synthetic-input-tokens-mean`, AIPerf needs an accurate count of the input tokens it just placed in the request payload to hit your target. The base endpoint walks `messages` content parts dispatched via `PART_TYPES`. Anthropic's `extract_payload_inputs` extends that walk in three places (see `_anthropic_internals.py`):
393+
394+
| Helper | What it harvests | Why it matters |
395+
|---|---|---|
396+
| `walk_system` | Top-level `system` (string OR list of `{"type":"text","text":...}` blocks) | Otherwise the system prompt — often the largest input — is invisible to ISL accounting |
397+
| `walk_tool_schemas` | `tools[*].input_schema` (serialised as JSON) | Anthropic's schema field is `input_schema`, not OpenAI's `parameters`; the base walk already harvests `name` and `description` |
398+
| `walk_tool_blocks` | `tool_use` `name` + `input`, and `tool_result` `content` (string or list of text blocks) | The server tokenises these on agentic-history replay; the base walk drops them because they are not in `PART_TYPES` |
399+
400+
The endpoint also overrides `PART_TYPES` itself: `IMAGE` is the bare string `image` (Anthropic's shape) rather than OpenAI's `image_url`, and `AUDIO` and `VIDEO` are explicitly empty sets so the walk does not miscount parts that happen to share OpenAI's type names.
401+
402+
If your synthetic ISL targeting was undercounting agentic / tool-heavy traces before, this is what changed.
403+
404+
### Assistant-turn replay (DAG / FORK mode)
405+
406+
For DAG datasets (`dag_jsonl`) where a child turn forks from a parent's history, AIPerf replays the parent's assistant reply as part of the child request's `messages`. The base implementation captures only the streamed text. Anthropic's `build_assistant_turn` reassembles the full content array — `thinking`, then `text`, then `tool_use` — so a FORK-mode child sees the complete reply the model originally produced.
407+
408+
```mermaid
409+
flowchart LR
410+
A[record.responses<br/>SSE events or<br/>non-streaming message] --> B[absorb_event<br/>per response]
411+
B --> C[text_parts]
412+
B --> D[thinking_blocks_by_index]
413+
B --> E[tool_uses_by_index]
414+
C --> F[finalise:<br/>thinking, text, tool_use]
415+
D --> F
416+
E --> F
417+
F --> G[Turn.raw_messages =<br/>assistant content array]
418+
```
419+
420+
Reassembly rules (`_anthropic_internals.py`):
421+
422+
- **Text** — accumulated from `content_block_delta` events with `delta.type == "text_delta"`.
423+
- **Thinking** — keyed by SSE `index`; `thinking_delta` fragments append to `thinking`, `signature_delta` fragments append to `signature`. Both round-trip — the server requires the matching signature to accept a thinking block on replay.
424+
- **Tool use** — keyed by SSE `index`; `input_json_delta.partial_json` fragments concatenate into a raw string, parsed once at finalise. Malformed JSON is preserved as a string under `input` rather than dropped, so the server rejects it loudly instead of AIPerf silently losing data.
425+
- **Non-streaming `type=message`** responses are absorbed by walking the full `content` array directly.
426+
427+
When neither thinking nor `tool_use` blocks are present, `build_assistant_turn` falls back to the base text-only behaviour — callers that don't use either feature see no change.
428+
429+
---
430+
431+
343432

344433
- **Tokenizer**: Anthropic models use a proprietary tokenizer. When using `--use-server-token-count`, you still need to specify a tokenizer for dataset generation (e.g., `--tokenizer gpt2`). For accurate client-side token counts, use a tokenizer that matches the model.
345434
- **Max tokens default**: The endpoint defaults to `max_tokens: 1024` when no value is specified in the turn data. Use `--osl` to control output sequence length.
346-
- **Audio and video**: The Anthropic Messages API does not support audio or video content blocks. If your dataset includes these, the endpoint logs a warning and drops them.
435+
- **Audio and video**: The Anthropic Messages API does not support audio or video content blocks. `_render_audio_part` / `_render_video_part` raise `NotImplementedError` so the failure surfaces at `format_payload` time. See [Audio and video are unsupported](#audio-and-video-are-unsupported).
347436
- **Custom endpoint path**: The default path is `/v1/messages`. Override with `--custom-endpoint /my/path` if your server uses a different path.
348437
- **Connection strategy**: For multi-turn benchmarks where server-side session affinity matters, use `--connection-reuse-strategy sticky-user-sessions`.

0 commit comments

Comments
 (0)