You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(anthropic): document raw_system, ISL accounting, and DAG/FORK replay
Updates the Anthropic Messages endpoint tutorial to cover the dag5
BaseEndpoint hooks and Turn fields adopted by AnthropicMessagesEndpoint:
- New "Advanced features" section documenting Turn.raw_system for
list-form system blocks (per-block cache_control, latest-non-None
resolution) and the audio/video NotImplementedError contract.
- New "Internals" section with a table of the extract_payload_inputs
walks (walk_system / walk_tool_schemas / walk_tool_blocks) and a
mermaid diagram of build_assistant_turn streaming reassembly
(thinking_delta + signature_delta per index, input_json_delta for
tool_use, fallback to base when neither is present).
- Corrected the existing audio/video tip - it claimed warn-and-drop;
the actual behaviour is NotImplementedError at format time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Copy file name to clipboardExpand all lines: docs/tutorials/anthropic-messages-endpoint.md
+91-2Lines changed: 91 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,6 +117,7 @@ Key formatting rules:
117
117
-**Stream**: Set from `--streaming` flag (default: `false`)
118
118
-**System**: Placed as a top-level string field when a system message is provided (never in the messages array)
119
119
-**Tools**: Included at the top level when tool definitions are present in the conversation
120
+
-**System (raw)**: When a turn carries `raw_system` (a list of content blocks), it is placed verbatim in the payload's top-level `system` field, overriding the conversation-level `system_message` string. See [Per-block system prompts (`raw_system`)](#per-block-system-prompts-raw_system)
120
121
-**Extra inputs**: Merged into the payload via `--extra-inputs` (e.g., `--extra-inputs temperature:0.7`)
121
122
122
123
### Simple Text Content
@@ -339,10 +340,98 @@ The usage object may include `cache_creation_input_tokens` and `cache_read_input
339
340
340
341
---
341
342
342
-
## Tips
343
+
## Advanced features
344
+
345
+
### Per-block system prompts (`raw_system`)
346
+
347
+
The conversation-level `system_message` string (set, for example, by `--shared-system-prompt-length`) is rendered as a plain top-level `system: "..."` string. That works for most cases, but cannot express Anthropic's per-block extensions — most importantly `cache_control: {"type": "ephemeral"}` for prompt caching.
348
+
349
+
To opt in, set `raw_system` on a `Turn` to a list of content blocks. The endpoint resolves it the same way as `raw_tools`: walking the turn list from the end and taking the first non-`None` value (`base_endpoint.py::_latest_turn_attr`). When set, it wins over `system_message`:
350
+
351
+
```json
352
+
{
353
+
"model": "claude-sonnet-4-20250514",
354
+
"system": [
355
+
{"type": "text", "text": "You are a careful code reviewer."},
356
+
{
357
+
"type": "text",
358
+
"text": "<long shared rubric...>",
359
+
"cache_control": {"type": "ephemeral"}
360
+
}
361
+
],
362
+
"messages": [...],
363
+
"max_tokens": 1024,
364
+
"stream": false
365
+
}
366
+
```
367
+
368
+
Notes:
369
+
370
+
- Latest-non-`None` turn wins — a FORK-mode DAG child that does not re-declare `raw_system` still inherits the parent's value.
371
+
- The field is Anthropic-specific. Other endpoints (`chat`, `responses`, ...) ignore it.
372
+
- Block contents flow into ISL accounting via the `walk_system` override; see [Input token accounting (ISL)](#input-token-accounting-isl) below.
373
+
-`raw_system` is currently populated by trace-replay loaders that ingest Anthropic-shaped traces; programmatic callers building `Turn` objects directly can set it as well.
374
+
375
+
### Audio and video are unsupported
376
+
377
+
The Anthropic Messages API does not accept audio or video content blocks. `_render_audio_part` and `_render_video_part` raise `NotImplementedError` immediately so a misuse fails at `format_payload` time with a clear error rather than producing an opaque server 4xx after dispatch:
378
+
379
+
```text
380
+
NotImplementedError: Anthropic Messages API does not support audio input.
381
+
Use a different endpoint, or remove audio content from the turn.
382
+
```
383
+
384
+
This is intentional. If your dataset has audio or video, route it to an endpoint that supports it (for example, the OpenAI `chat` endpoint with a multimodal-capable model) — do not try to coerce it into `anthropic_messages`.
385
+
386
+
---
387
+
388
+
## Internals
389
+
390
+
### Input token accounting (ISL)
391
+
392
+
When you set `--synthetic-input-tokens-mean`, AIPerf needs an accurate count of the input tokens it just placed in the request payload to hit your target. The base endpoint walks `messages` content parts dispatched via `PART_TYPES`. Anthropic's `extract_payload_inputs` extends that walk in three places (see `_anthropic_internals.py`):
393
+
394
+
| Helper | What it harvests | Why it matters |
395
+
|---|---|---|
396
+
|`walk_system`| Top-level `system` (string OR list of `{"type":"text","text":...}` blocks) | Otherwise the system prompt — often the largest input — is invisible to ISL accounting |
397
+
|`walk_tool_schemas`|`tools[*].input_schema` (serialised as JSON) | Anthropic's schema field is `input_schema`, not OpenAI's `parameters`; the base walk already harvests `name` and `description`|
398
+
|`walk_tool_blocks`|`tool_use``name` + `input`, and `tool_result``content` (string or list of text blocks) | The server tokenises these on agentic-history replay; the base walk drops them because they are not in `PART_TYPES`|
399
+
400
+
The endpoint also overrides `PART_TYPES` itself: `IMAGE` is the bare string `image` (Anthropic's shape) rather than OpenAI's `image_url`, and `AUDIO` and `VIDEO` are explicitly empty sets so the walk does not miscount parts that happen to share OpenAI's type names.
401
+
402
+
If your synthetic ISL targeting was undercounting agentic / tool-heavy traces before, this is what changed.
403
+
404
+
### Assistant-turn replay (DAG / FORK mode)
405
+
406
+
For DAG datasets (`dag_jsonl`) where a child turn forks from a parent's history, AIPerf replays the parent's assistant reply as part of the child request's `messages`. The base implementation captures only the streamed text. Anthropic's `build_assistant_turn` reassembles the full content array — `thinking`, then `text`, then `tool_use` — so a FORK-mode child sees the complete reply the model originally produced.
F --> G[Turn.raw_messages =<br/>assistant content array]
418
+
```
419
+
420
+
Reassembly rules (`_anthropic_internals.py`):
421
+
422
+
-**Text** — accumulated from `content_block_delta` events with `delta.type == "text_delta"`.
423
+
-**Thinking** — keyed by SSE `index`; `thinking_delta` fragments append to `thinking`, `signature_delta` fragments append to `signature`. Both round-trip — the server requires the matching signature to accept a thinking block on replay.
424
+
-**Tool use** — keyed by SSE `index`; `input_json_delta.partial_json` fragments concatenate into a raw string, parsed once at finalise. Malformed JSON is preserved as a string under `input` rather than dropped, so the server rejects it loudly instead of AIPerf silently losing data.
425
+
-**Non-streaming `type=message`** responses are absorbed by walking the full `content` array directly.
426
+
427
+
When neither thinking nor `tool_use` blocks are present, `build_assistant_turn` falls back to the base text-only behaviour — callers that don't use either feature see no change.
428
+
429
+
---
430
+
431
+
343
432
344
433
-**Tokenizer**: Anthropic models use a proprietary tokenizer. When using `--use-server-token-count`, you still need to specify a tokenizer for dataset generation (e.g., `--tokenizer gpt2`). For accurate client-side token counts, use a tokenizer that matches the model.
345
434
-**Max tokens default**: The endpoint defaults to `max_tokens: 1024` when no value is specified in the turn data. Use `--osl` to control output sequence length.
346
-
-**Audio and video**: The Anthropic Messages API does not support audio or video content blocks. If your dataset includes these, the endpoint logs a warning and drops them.
435
+
-**Audio and video**: The Anthropic Messages API does not support audio or video content blocks. `_render_audio_part` / `_render_video_part` raise `NotImplementedError` so the failure surfaces at `format_payload` time. See [Audio and video are unsupported](#audio-and-video-are-unsupported).
347
436
-**Custom endpoint path**: The default path is `/v1/messages`. Override with `--custom-endpoint /my/path` if your server uses a different path.
348
437
-**Connection strategy**: For multi-turn benchmarks where server-side session affinity matters, use `--connection-reuse-strategy sticky-user-sessions`.
0 commit comments