proof of concept: agentic workflow builder by hthillman · Pull Request #971 · daydreamlive/scope

hthillman · 2026-04-20T22:39:20Z

Summary

Adds an in-app agent that translates natural-language intent into working Scope graphs and live parameter tweaks. Accessible from a new Agent button next to Graph in the toolbar; a right-side resizable drawer hosts the conversation, streaming replies, tool-call blocks, and workflow-proposal cards.

Autonomy model — structural changes (new graph, pipeline load) require explicit Approve; runtime params (prompts, noise scale, LoRA weights) auto-apply with an inline undo.
Multi-provider BYOK — Anthropic (default), any OpenAI-compatible endpoint (OpenAI, OpenRouter, Groq, together.ai, Fireworks), and self-hosted (Ollama, vLLM, LM Studio). Configured under Settings → Agent; API keys live in the existing API Keys tab (extended for anthropic, openai, llm_custom).
Extensibility — everything the agent knows about Scope is discovered at runtime via introspection tools (pipeline registry, Pydantic schema metadata, blueprints, LoRAs, assets, node-type manifest). New pipelines and nodes work on day 0 without touching agent code.
MCP parity — tool bodies are pure-Python in agent_tool_impls.py and shared between the MCP server and the in-app agent, so behavior stays identical.

What's inside

Backend (src/scope/server/)

agent_tool_impls.py — shared tool surface (list/load pipelines, get schemas, capture frame, propose/apply workflow, update params, recording, logs, hardware, etc.)
agent_state.py — session store, provider config, proposal handshake with graph hash
agent_providers.py — Anthropic + OpenAI-compatible + self-hosted providers behind a single internal event protocol
agent_loop.py — SSE turn runner; streams text deltas, tool calls, tool results, proposals
app.py — /api/v1/agent/chat (SSE), /agent/decision, /agent/config, /agent/sessions, /agent/node-catalog; keys endpoint extended
anthropic>=0.40 added to pyproject.toml

Frontend (frontend/src/)

components/agent/ — AgentDrawer, ChatTranscript, MessageBubble, ToolCallBlock, Composer, WorkflowProposalCard
contexts/AgentContext.tsx + lib/agentClient.ts — React state + SSE-over-POST reader (fetch + ReadableStream, since EventSource can't POST)
components/settings/AgentProviderTab.tsx — new settings tab, wired into SettingsDialog + Header
components/graph/GraphToolbar.tsx — Agent button next to Graph
data/nodes/manifest.json — UI node-type catalog so the agent can compose arbitrary graphs without hardcoding node types

Target flows this unlocks

"Hyperrealistic scene with 3–5 switchable prompts" → agent picks a pipeline (e.g. longlive), grafts in the manual-prompt-switcher blueprint, preloads prompts, proposes the graph. User approves → applied.
"It's not recognizing depth" → agent calls capture_frame, reasons over the JPEG, calls get_pipeline_schema for the current pipeline, and tunes the relevant knob (e.g. vace_context_scale). Auto-applied.
"Help me record it" → agent adds a record node to the current graph (proposed) or starts headless recording, then returns the download URL.

Test plan

Out of MVP

Chat history does not persist across server restart (in-memory only)
No multi-user session scoping
No streaming tool-call argument rendering (tool call fires on complete)
System prompt is not user-editable

🤖 Generated with Claude Code

coderabbitai · 2026-04-20T22:39:27Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 223805d7-644b-48fc-8b92-42d1eaa4465a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/laughing-beaver-78a34d

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-20T23:07:26Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-971--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-971--preview/ws`
Commit	`e101514`

Livepeer Runner


App ID	`daydream/scope-livepeer-pr-971--preview`
WebSocket	`wss://fal.run/daydream/scope-livepeer-pr-971--preview/ws`
Auth	`private`

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-971--preview/ws" uv run daydream-scope

Add an in-app agent that translates natural-language intent into working Scope graphs and runtime parameter tweaks. Accessible from a new Agent button next to Graph in the toolbar; a right-side resizable drawer hosts the conversation, streaming responses, and workflow-proposal cards. Structural changes (new graph / pipeline load) require an explicit approve step; runtime params (prompts, noise, LoRA weights) auto-apply. Multi-provider BYOK: Anthropic (default), any OpenAI-compatible endpoint (OpenAI, OpenRouter, Groq, together.ai, Fireworks), and self-hosted (Ollama, vLLM, LM Studio). Provider + model are configured under a new Settings → Agent tab; API keys continue to live in the API Keys tab (extended to cover anthropic, openai, llm_custom). Agent tools are pure-Python in `agent_tool_impls.py` so both the MCP server and the in-app agent share the same surface. Everything the agent knows about Scope is discovered at runtime through introspection tools (pipeline registry, schema metadata, blueprints, LoRAs, assets, node-type manifest), so new pipelines and nodes work on day 0 without touching agent code. Backend - `src/scope/server/agent_tool_impls.py`: shared tool surface - `src/scope/server/agent_state.py`: session store + provider config - `src/scope/server/agent_providers.py`: Anthropic + OpenAI-compat + self-hosted providers behind a single event protocol - `src/scope/server/agent_loop.py`: SSE turn runner with propose→approve handshake and vision feedback - `src/scope/server/app.py`: `/api/v1/agent/chat` (SSE), `/agent/decision`, `/agent/config`, `/agent/sessions`, `/agent/node-catalog`; `/api/v1/keys/*` extended - `anthropic>=0.40` added to pyproject Frontend - `frontend/src/components/agent/`: AgentDrawer, ChatTranscript, MessageBubble, ToolCallBlock, Composer, WorkflowProposalCard - `frontend/src/contexts/AgentContext.tsx` + `lib/agentClient.ts`: state + SSE-over-POST reader - `frontend/src/components/settings/AgentProviderTab.tsx`: new settings tab, wired into SettingsDialog + Header - `frontend/src/components/graph/GraphToolbar.tsx`: Agent button - `frontend/src/data/nodes/manifest.json`: UI node-type catalog so the agent can compose arbitrary node graphs without hardcoding Signed-off-by: Hunter Hillman <hthillman@gmail.com>

…mpact tool UI list_pipelines and get_pipeline_schema were reading .get("schemas", {}) from /api/v1/pipelines/schemas, but the endpoint returns {"pipelines": {...}}. This made every pipeline (including plugin pipelines like ltx2/helios) invisible to the agent, which then probed repeatedly and hit the round cap. - Read from "pipelines" key; include name/description/supported_modes in the summary. - On unknown pipeline_id, return the list of available ids so the agent can recover. - Raise MAX_TOOL_ROUNDS 12→40 (a real workflow build legitimately chains many calls). - Compact tool-call UI: shared subtle container, lighter chrome, smaller icons so many tool calls no longer feel heavy while staying visible. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

On approval, the agent now writes the proposed graph into the React Flow canvas via a registered importer; the user presses Play to start. Previously apply_workflow POSTed to /pipeline/load + /session/start, which (a) never surfaced the graph in the UI and (b) failed in cloud mode because the backend tried to load pipelines on the local instance. - AgentContext: registerGraphImporter() pattern; decideProposal writes to canvas before notifying backend, toasts "Press Play to start". - GraphEditor: expose loadGraphConfig via useImperativeHandle (delegates to existing loadGraphFromParsed). - StreamPage: register importer that routes to the GraphEditor ref. - apply_workflow: strip pipeline-load / session-start side effects; just validate hash, clear pending proposal, return pipelines list. - System prompt: "approval writes the graph to the canvas automatically — apply_workflow only confirms it. Never call start_session, load_pipeline, or any session-starting tool." Continuation message updated to match. - .claude/launch.json: add scope-cloud dev entry for preview testing against the Livepeer cloud relay. Signed-off-by: Hunter Hillman <hthillman@gmail.com>

The agent kept concluding "propose_workflow only accepts backend nodes, UI nodes must be added manually." That's wrong — GraphConfig silently ignores extra fields, and the original dict (with ui_state intact) is what gets stored on the proposal and sent to the frontend. The frontend already handles ui_state via graphConfigToFlow. So UI nodes CAN round- trip through propose_workflow; the agent just didn't know the shape. Verified with a Python repro: {nodes, edges, ui_state: {nodes: [trigger, slider], edges: [...]}} passes GraphConfig(**graph).validate_structure() cleanly and the original dict keeps ui_state. - System prompt GRAPH SHAPE section: explicit split between top-level nodes/edges (backend: source|pipeline|sink|record) and ui_state.nodes/ ui_state.edges (everything else). Note blueprint grafting goes under ui_state, not top-level. - propose_workflow tool description: spell out both parts; warn that UI nodes in top-level nodes will fail validation; point at get_current_graph for a concrete example. - get_blueprint tool description: flag that its nodes/edges are UI-typed and need to land in ui_state when grafted. Signed-off-by: Hunter Hillman <hthillman@gmail.com>

Workflows proposed by the agent were often missing wires (no VACE node, no prompt connection) or used invalid handle prefixes ('parameter:'), forcing the user to patch the graph manually. Fix that end-to-end: - Add _validate_proposal() invoked by propose_workflow. Checks handle format ('param:'/'stream:' only), edge source/target existence, pipeline-handle presence, subgraph internal consistency, and emits warnings for likely-missing wires (VACE → pipeline, prompt → pipeline). Errors bounce back to the agent with actionable messages so it can iterate; warnings flow through on success. - Add get_pipeline_handles(pipeline_id) tool. Returns the exact stream_inputs / stream_outputs / param_inputs a ui_state edge may target, including aggregate handles (param:__prompt / __vace / __loras) that depend on pipeline capability flags. - Update node manifest: document VACE's real input handles (ref_image, first_frame, last_frame, context_scale), clarify subgraph dynamic ports via data.subgraphInputs/Outputs, describe control switch mode, and add a $handle_convention key. - Append a WIRING cheat sheet to SYSTEM_PROMPT with canonical slider / prompt / VACE patterns so the agent has a concrete template to copy. - Rewrite propose_workflow tool description to spell out the param: vs stream: convention and require a get_pipeline_handles call before wiring to a pipeline. Tests: tests/test_agent_tool_impls.py covers the validator's reject paths (bad prefix, missing target, unknown pipeline handle, subgraph inconsistency, external port mismatch), its accept path, its VACE / prompt warnings, and the handle deriver's aggregate inclusion. Signed-off-by: Hunter Hillman <hthillman@gmail.com>

Adds evals/ package that drives the real agent endpoint in-process (via httpx.ASGITransport + asgi-lifespan) and grades proposals with deterministic structural checks. Three starter-workflow cases are prepopulated (mythical-creature, dissolving-sunflower, ltx-text-to-video); authoring more is dropping a YAML into evals/cases/. Excluded from default pytest (addopts -m 'not eval') and from PR CI — a manual-dispatch workflow runs it on demand so we can measure pass-rate without burning API budget on every push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

Adds two cases spanning the specificity range real users send: - complex-krea-prompt-switch-record: precise multi-concept prompt (krea pipeline + VACE reference image + prompt_list with >=5 items driven by trigger + record node). Exercises the full new check surface. - vague-capture-moments: deliberately vague "play with my webcam and capture anything cool" — tests the agent's ability to fill gaps, with graders that only assert clearly-implied structure. New grader checks: - pipelines_count_at_least — min pipeline count without pinning ids - node_present — asserts N UI nodes of a type exist; optional min_items for prompt_list - wire_present { kind: pipeline_to_record } — pipeline stream output into a record node - wire_present { kind: prompt_list_to_pipeline } — prompt_list to pipeline's param:__prompt - wire_present { kind: trigger_to_prompt_list } — value source into a prompt_list's param:trigger / param:cycle README documents the precise-vs-vague authoring pattern and the new check reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

Driven by the Krea eval failure: `complex-krea-prompt-switch-record` flagged 9/9 structural checks, and analysis split the failures into real agent mistakes (ignored pipeline name, invented handles, skipped prompt_list) plus two infrastructure bugs masking correct behavior. Agent improvements (src/scope/server/agent_loop.py): - "Honor the user's pipeline name" rule (CORE PRINCIPLES 1a). If the user names a pipeline, match against list_pipelines id+name and use it; never silently substitute. - "Never fabricate a parameter name" rule (WIRING). Reference-image conditioning always goes through VACE; forbid invented handles like param:i2v_image. - Promote prompt_list to the canonical "switch between N prompts with a button press" pattern; demote subgraph switcher to fallback for non-button-driven cases. - Completeness checklist gains items for recording (add record node + stream edge when user says save/capture) and prompt-list switching. Manifest fix (frontend/src/data/nodes/manifest.json): - prompt_list entry was lying about its handle names. Output was documented as "prompts" but the real React Flow handle built by PromptListNode.tsx is `param:prompt` (singular). The trigger and cycle input handles were missing entirely, as was the data.promptListItems data field. Corrected all three. Grader fixes (evals/grader.py): - node_present now searches top-level graph.nodes for source/pipeline/sink/record types; still searches ui_state.nodes for UI-only types (slider, vace, prompt_list, ...). - wire_present kind=pipeline_to_record accepts top-level stream edges (the canonical form a record node is wired with), in addition to the existing ui_state form. - Both bugs were false-negating the Krea run's correctly-wired top-level record node. New eval cases (evals/cases/): - complex-pipeline-name-respect — user names "krea"; agent must pick krea-realtime-video, not substitute. - complex-reference-image-no-invented-handles — user asks for reference-image conditioning on longlive; grader asserts the image → vace → pipeline path, closing the invented-handle loophole. - vague-save-the-output — "saves whatever I make" is phrasing for a record node; exercises the new completeness item. Verification: regrading the saved Krea artifact flips 2 failures to pass (record detection) and leaves 7 real agent failures. Live smoke on vague-save-the-output (passthrough) passes 1/1. Regression on starter-ltx-text-to-video is 3/3. The two GPU-gated cases (complex-pipeline-name-respect, complex-reference-image-no-invented-handles) correctly trigger the agent's "can't do what you asked, ask first" behavior when the named pipeline isn't registered locally; they're authored for GPU production CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

User reported the agent produced a working workflow with two sink nodes, one of which had no incoming edge and shouldn't have been there. Validation passed (disconnected sinks are legal at the schema level) so only a grader check can guard against this. Adds a new `orphan_sinks` check that, for each top-level sink, looks for at least one top-level stream edge targeting it. Any sink without one is flagged. Wired into the forbid list of all 8 existing cases so it's a universal regression guard rather than dependent on a single reproducer prompt. Verified the check against the saved r01 artifacts: all pass (the previous Krea run had one correctly-wired sink, not an orphan). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

…i_node reflow Drawer was a full-height fixed overlay that obscured the graph; now it's a flex sibling of StreamPage, so opening it resizes the canvas instead of covering it. API-key / provider changes now dispatch a `scope:agent-config-changed` window event that AgentContext listens on, so the "no API key configured" banner clears without a reload. SYSTEM_PROMPT tightened: propose_workflow is explicitly structural-only, runtime-tweakable params must go through update_parameters (not a new proposal), and the STYLE block bans meta-narration phrases the model was leaking into chat ("Let me...", "Hmm...", "The field is X (labeled Y)..."). Added a LAYOUT section telling the agent to keep UI-state nodes LEFT of x=0 so they don't collide with the frontend's top-level auto-layout strip. As a safety net, propose_workflow now runs _reflow_ui_nodes after validation: AABB-test every UI node against every other UI node and against the predicted top-level column rectangles; if anything overlaps, reassign all UI nodes into three deterministic columns at x=-320/-620/-920 by role (sliders/primitives/triggers/math closest in, prompt lists middle, image/vace/lora/subgraph outermost). No-op when the agent's layout is already clean. Verified on the known-bad complex-krea-prompt-switch-record proposal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

…eak cases Two new regression guards: overlapping_nodes (forbid check): catches positioning bugs that render as visually stacked nodes even when edges are correct. Mirrors the reflow's bbox math — 240x140 for UI nodes (280 tall for image/vace/subgraph), 200x60 for top-level nodes at x=50/350/650/950. Added to all 8 existing cases and to the new layout-nodes-spaced case. forbid_proposal (Case field): set true on cases where the agent must NOT emit workflow_proposal because update_parameters is the right tool. Runner treats presence of a proposal as a failure instead of the usual "no proposal means fail" short-circuit. Two new cases: - layout-nodes-spaced: longlive + 2 sliders + prompt_list with triggers, exercising the many-UI-nodes-alongside-multiple-top-level-nodes surface where overlaps were observed. - runtime-tweak-no-repropose: user asks to change noise_scale on a running graph; any workflow_proposal fails the case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Hunter Hillman <hthillman@gmail.com>

hthillman changed the title ~~feat: agentic workflow builder~~ proof of concept: agentic workflow builder Apr 20, 2026

hthillman marked this pull request as draft April 20, 2026 22:40

hthillman and others added 9 commits April 21, 2026 10:32

hthillman force-pushed the claude/laughing-beaver-78a34d branch from 8f80cf7 to 33db077 Compare April 21, 2026 17:49

hthillman and others added 2 commits April 21, 2026 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proof of concept: agentic workflow builder#971

proof of concept: agentic workflow builder#971
hthillman wants to merge 11 commits intomainfrom
claude/laughing-beaver-78a34d

hthillman commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hthillman commented Apr 20, 2026

Summary

What's inside

Target flows this unlocks

Test plan

Out of MVP

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 fal.ai Preview Deployment

Livepeer Runner

Testing Livepeer Mode

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

github-actions Bot commented Apr 20, 2026 •

edited

Loading