Skip to content

proof of concept: agentic workflow builder#971

Draft
hthillman wants to merge 11 commits intomainfrom
claude/laughing-beaver-78a34d
Draft

proof of concept: agentic workflow builder#971
hthillman wants to merge 11 commits intomainfrom
claude/laughing-beaver-78a34d

Conversation

@hthillman
Copy link
Copy Markdown
Collaborator

Summary

Adds an in-app agent that translates natural-language intent into working Scope graphs and live parameter tweaks. Accessible from a new Agent button next to Graph in the toolbar; a right-side resizable drawer hosts the conversation, streaming replies, tool-call blocks, and workflow-proposal cards.

  • Autonomy model — structural changes (new graph, pipeline load) require explicit Approve; runtime params (prompts, noise scale, LoRA weights) auto-apply with an inline undo.
  • Multi-provider BYOK — Anthropic (default), any OpenAI-compatible endpoint (OpenAI, OpenRouter, Groq, together.ai, Fireworks), and self-hosted (Ollama, vLLM, LM Studio). Configured under Settings → Agent; API keys live in the existing API Keys tab (extended for anthropic, openai, llm_custom).
  • Extensibility — everything the agent knows about Scope is discovered at runtime via introspection tools (pipeline registry, Pydantic schema metadata, blueprints, LoRAs, assets, node-type manifest). New pipelines and nodes work on day 0 without touching agent code.
  • MCP parity — tool bodies are pure-Python in agent_tool_impls.py and shared between the MCP server and the in-app agent, so behavior stays identical.

What's inside

Backend (src/scope/server/)

  • agent_tool_impls.py — shared tool surface (list/load pipelines, get schemas, capture frame, propose/apply workflow, update params, recording, logs, hardware, etc.)
  • agent_state.py — session store, provider config, proposal handshake with graph hash
  • agent_providers.py — Anthropic + OpenAI-compatible + self-hosted providers behind a single internal event protocol
  • agent_loop.py — SSE turn runner; streams text deltas, tool calls, tool results, proposals
  • app.py/api/v1/agent/chat (SSE), /agent/decision, /agent/config, /agent/sessions, /agent/node-catalog; keys endpoint extended
  • anthropic>=0.40 added to pyproject.toml

Frontend (frontend/src/)

  • components/agent/AgentDrawer, ChatTranscript, MessageBubble, ToolCallBlock, Composer, WorkflowProposalCard
  • contexts/AgentContext.tsx + lib/agentClient.ts — React state + SSE-over-POST reader (fetch + ReadableStream, since EventSource can't POST)
  • components/settings/AgentProviderTab.tsx — new settings tab, wired into SettingsDialog + Header
  • components/graph/GraphToolbar.tsx — Agent button next to Graph
  • data/nodes/manifest.json — UI node-type catalog so the agent can compose arbitrary graphs without hardcoding node types

Target flows this unlocks

  1. "Hyperrealistic scene with 3–5 switchable prompts" → agent picks a pipeline (e.g. longlive), grafts in the manual-prompt-switcher blueprint, preloads prompts, proposes the graph. User approves → applied.
  2. "It's not recognizing depth" → agent calls capture_frame, reasons over the JPEG, calls get_pipeline_schema for the current pipeline, and tunes the relevant knob (e.g. vace_context_scale). Auto-applied.
  3. "Help me record it" → agent adds a record node to the current graph (proposed) or starts headless recording, then returns the download URL.

Test plan

  • uv run ruff check src/ — clean
  • uv run ruff format --check src/ — clean
  • npm run build — builds
  • npm run format:check — clean
  • npm run lint — 0 errors (49 pre-existing warnings, no new ones)
  • uv run daydream-scope starts clean; log shows Agent session store initialized
  • Agent endpoints respond: GET /api/v1/agent/config, GET /api/v1/agent/node-catalog, GET /api/v1/agent/sessions, GET /api/v1/keys (includes anthropic/openai/llm_custom)
  • Drawer opens; provider badge shows CLAUDE • CLAUDE-SONNET-4-6; missing-key banner renders
  • Settings → Agent tab renders all three providers with correct defaults
  • Exercise end-to-end with an Anthropic key: run one of the three example flows, verify proposal card → approve → graph applied; verify capture_frame + update_parameters auto-apply
  • Exercise with an OpenAI-compatible endpoint
  • Exercise with a local self-hosted endpoint (Ollama / LM Studio)

Out of MVP

  • Chat history does not persist across server restart (in-memory only)
  • No multi-user session scoping
  • No streaming tool-call argument rendering (tool call fires on complete)
  • System prompt is not user-editable

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 223805d7-644b-48fc-8b92-42d1eaa4465a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/laughing-beaver-78a34d

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hthillman hthillman changed the title feat: agentic workflow builder proof of concept: agentic workflow builder Apr 20, 2026
@hthillman hthillman marked this pull request as draft April 20, 2026 22:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-971--preview
WebSocket wss://fal.run/daydream/scope-pr-971--preview/ws
Commit e101514

Livepeer Runner

App ID daydream/scope-livepeer-pr-971--preview
WebSocket wss://fal.run/daydream/scope-livepeer-pr-971--preview/ws
Auth private

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-971--preview/ws" uv run daydream-scope

hthillman and others added 9 commits April 21, 2026 10:32
Add an in-app agent that translates natural-language intent into
working Scope graphs and runtime parameter tweaks. Accessible from
a new Agent button next to Graph in the toolbar; a right-side
resizable drawer hosts the conversation, streaming responses, and
workflow-proposal cards. Structural changes (new graph / pipeline
load) require an explicit approve step; runtime params
(prompts, noise, LoRA weights) auto-apply.

Multi-provider BYOK: Anthropic (default), any OpenAI-compatible
endpoint (OpenAI, OpenRouter, Groq, together.ai, Fireworks), and
self-hosted (Ollama, vLLM, LM Studio). Provider + model are
configured under a new Settings → Agent tab; API keys continue to
live in the API Keys tab (extended to cover anthropic, openai,
llm_custom).

Agent tools are pure-Python in `agent_tool_impls.py` so both the
MCP server and the in-app agent share the same surface. Everything
the agent knows about Scope is discovered at runtime through
introspection tools (pipeline registry, schema metadata,
blueprints, LoRAs, assets, node-type manifest), so new pipelines
and nodes work on day 0 without touching agent code.

Backend
- `src/scope/server/agent_tool_impls.py`: shared tool surface
- `src/scope/server/agent_state.py`: session store + provider config
- `src/scope/server/agent_providers.py`: Anthropic + OpenAI-compat
  + self-hosted providers behind a single event protocol
- `src/scope/server/agent_loop.py`: SSE turn runner with
  propose→approve handshake and vision feedback
- `src/scope/server/app.py`: `/api/v1/agent/chat` (SSE),
  `/agent/decision`, `/agent/config`, `/agent/sessions`,
  `/agent/node-catalog`; `/api/v1/keys/*` extended
- `anthropic>=0.40` added to pyproject

Frontend
- `frontend/src/components/agent/`: AgentDrawer, ChatTranscript,
  MessageBubble, ToolCallBlock, Composer, WorkflowProposalCard
- `frontend/src/contexts/AgentContext.tsx` + `lib/agentClient.ts`:
  state + SSE-over-POST reader
- `frontend/src/components/settings/AgentProviderTab.tsx`: new
  settings tab, wired into SettingsDialog + Header
- `frontend/src/components/graph/GraphToolbar.tsx`: Agent button
- `frontend/src/data/nodes/manifest.json`: UI node-type catalog
  so the agent can compose arbitrary node graphs without hardcoding

Signed-off-by: Hunter Hillman <hthillman@gmail.com>
…mpact tool UI

list_pipelines and get_pipeline_schema were reading .get("schemas", {})
from /api/v1/pipelines/schemas, but the endpoint returns {"pipelines": {...}}.
This made every pipeline (including plugin pipelines like ltx2/helios)
invisible to the agent, which then probed repeatedly and hit the round cap.

- Read from "pipelines" key; include name/description/supported_modes in the summary.
- On unknown pipeline_id, return the list of available ids so the agent can recover.
- Raise MAX_TOOL_ROUNDS 12→40 (a real workflow build legitimately chains many calls).
- Compact tool-call UI: shared subtle container, lighter chrome, smaller icons
  so many tool calls no longer feel heavy while staying visible.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
On approval, the agent now writes the proposed graph into the React
Flow canvas via a registered importer; the user presses Play to start.
Previously apply_workflow POSTed to /pipeline/load + /session/start,
which (a) never surfaced the graph in the UI and (b) failed in cloud
mode because the backend tried to load pipelines on the local instance.

- AgentContext: registerGraphImporter() pattern; decideProposal writes
  to canvas before notifying backend, toasts "Press Play to start".
- GraphEditor: expose loadGraphConfig via useImperativeHandle
  (delegates to existing loadGraphFromParsed).
- StreamPage: register importer that routes to the GraphEditor ref.
- apply_workflow: strip pipeline-load / session-start side effects;
  just validate hash, clear pending proposal, return pipelines list.
- System prompt: "approval writes the graph to the canvas automatically
  — apply_workflow only confirms it. Never call start_session,
  load_pipeline, or any session-starting tool." Continuation message
  updated to match.
- .claude/launch.json: add scope-cloud dev entry for preview testing
  against the Livepeer cloud relay.

Signed-off-by: Hunter Hillman <hthillman@gmail.com>
The agent kept concluding "propose_workflow only accepts backend nodes,
UI nodes must be added manually." That's wrong — GraphConfig silently
ignores extra fields, and the original dict (with ui_state intact) is
what gets stored on the proposal and sent to the frontend. The frontend
already handles ui_state via graphConfigToFlow. So UI nodes CAN round-
trip through propose_workflow; the agent just didn't know the shape.

Verified with a Python repro: {nodes, edges, ui_state: {nodes: [trigger,
slider], edges: [...]}} passes GraphConfig(**graph).validate_structure()
cleanly and the original dict keeps ui_state.

- System prompt GRAPH SHAPE section: explicit split between top-level
  nodes/edges (backend: source|pipeline|sink|record) and ui_state.nodes/
  ui_state.edges (everything else). Note blueprint grafting goes under
  ui_state, not top-level.
- propose_workflow tool description: spell out both parts; warn that UI
  nodes in top-level nodes will fail validation; point at
  get_current_graph for a concrete example.
- get_blueprint tool description: flag that its nodes/edges are UI-typed
  and need to land in ui_state when grafted.

Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Workflows proposed by the agent were often missing wires (no VACE node,
no prompt connection) or used invalid handle prefixes ('parameter:'),
forcing the user to patch the graph manually. Fix that end-to-end:

- Add _validate_proposal() invoked by propose_workflow. Checks handle
  format ('param:'/'stream:' only), edge source/target existence,
  pipeline-handle presence, subgraph internal consistency, and emits
  warnings for likely-missing wires (VACE → pipeline, prompt → pipeline).
  Errors bounce back to the agent with actionable messages so it can
  iterate; warnings flow through on success.
- Add get_pipeline_handles(pipeline_id) tool. Returns the exact
  stream_inputs / stream_outputs / param_inputs a ui_state edge may
  target, including aggregate handles (param:__prompt / __vace /
  __loras) that depend on pipeline capability flags.
- Update node manifest: document VACE's real input handles
  (ref_image, first_frame, last_frame, context_scale), clarify subgraph
  dynamic ports via data.subgraphInputs/Outputs, describe control switch
  mode, and add a $handle_convention key.
- Append a WIRING cheat sheet to SYSTEM_PROMPT with canonical slider /
  prompt / VACE patterns so the agent has a concrete template to copy.
- Rewrite propose_workflow tool description to spell out the param: vs
  stream: convention and require a get_pipeline_handles call before
  wiring to a pipeline.

Tests: tests/test_agent_tool_impls.py covers the validator's reject
paths (bad prefix, missing target, unknown pipeline handle, subgraph
inconsistency, external port mismatch), its accept path, its VACE /
prompt warnings, and the handle deriver's aggregate inclusion.

Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Adds evals/ package that drives the real agent endpoint in-process
(via httpx.ASGITransport + asgi-lifespan) and grades proposals with
deterministic structural checks. Three starter-workflow cases are
prepopulated (mythical-creature, dissolving-sunflower, ltx-text-to-video);
authoring more is dropping a YAML into evals/cases/.

Excluded from default pytest (addopts -m 'not eval') and from PR CI —
a manual-dispatch workflow runs it on demand so we can measure pass-rate
without burning API budget on every push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Adds two cases spanning the specificity range real users send:
- complex-krea-prompt-switch-record: precise multi-concept prompt
  (krea pipeline + VACE reference image + prompt_list with >=5 items
  driven by trigger + record node). Exercises the full new check surface.
- vague-capture-moments: deliberately vague "play with my webcam and
  capture anything cool" — tests the agent's ability to fill gaps, with
  graders that only assert clearly-implied structure.

New grader checks:
- pipelines_count_at_least — min pipeline count without pinning ids
- node_present — asserts N UI nodes of a type exist; optional min_items
  for prompt_list
- wire_present { kind: pipeline_to_record } — pipeline stream output
  into a record node
- wire_present { kind: prompt_list_to_pipeline } — prompt_list to
  pipeline's param:__prompt
- wire_present { kind: trigger_to_prompt_list } — value source into
  a prompt_list's param:trigger / param:cycle

README documents the precise-vs-vague authoring pattern and the new
check reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Driven by the Krea eval failure: `complex-krea-prompt-switch-record`
flagged 9/9 structural checks, and analysis split the failures into
real agent mistakes (ignored pipeline name, invented handles, skipped
prompt_list) plus two infrastructure bugs masking correct behavior.

Agent improvements (src/scope/server/agent_loop.py):
- "Honor the user's pipeline name" rule (CORE PRINCIPLES 1a). If the
  user names a pipeline, match against list_pipelines id+name and
  use it; never silently substitute.
- "Never fabricate a parameter name" rule (WIRING). Reference-image
  conditioning always goes through VACE; forbid invented handles like
  param:i2v_image.
- Promote prompt_list to the canonical "switch between N prompts with
  a button press" pattern; demote subgraph switcher to fallback for
  non-button-driven cases.
- Completeness checklist gains items for recording (add record node +
  stream edge when user says save/capture) and prompt-list switching.

Manifest fix (frontend/src/data/nodes/manifest.json):
- prompt_list entry was lying about its handle names. Output was
  documented as "prompts" but the real React Flow handle built by
  PromptListNode.tsx is `param:prompt` (singular). The trigger and
  cycle input handles were missing entirely, as was the
  data.promptListItems data field. Corrected all three.

Grader fixes (evals/grader.py):
- node_present now searches top-level graph.nodes for
  source/pipeline/sink/record types; still searches ui_state.nodes
  for UI-only types (slider, vace, prompt_list, ...).
- wire_present kind=pipeline_to_record accepts top-level stream edges
  (the canonical form a record node is wired with), in addition to
  the existing ui_state form.
- Both bugs were false-negating the Krea run's correctly-wired
  top-level record node.

New eval cases (evals/cases/):
- complex-pipeline-name-respect — user names "krea"; agent must pick
  krea-realtime-video, not substitute.
- complex-reference-image-no-invented-handles — user asks for
  reference-image conditioning on longlive; grader asserts the
  image → vace → pipeline path, closing the invented-handle loophole.
- vague-save-the-output — "saves whatever I make" is phrasing for a
  record node; exercises the new completeness item.

Verification: regrading the saved Krea artifact flips 2 failures to
pass (record detection) and leaves 7 real agent failures. Live smoke
on vague-save-the-output (passthrough) passes 1/1. Regression on
starter-ltx-text-to-video is 3/3. The two GPU-gated cases
(complex-pipeline-name-respect, complex-reference-image-no-invented-handles)
correctly trigger the agent's "can't do what you asked, ask first"
behavior when the named pipeline isn't registered locally; they're
authored for GPU production CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
User reported the agent produced a working workflow with two sink
nodes, one of which had no incoming edge and shouldn't have been
there. Validation passed (disconnected sinks are legal at the schema
level) so only a grader check can guard against this.

Adds a new `orphan_sinks` check that, for each top-level sink, looks
for at least one top-level stream edge targeting it. Any sink without
one is flagged. Wired into the forbid list of all 8 existing cases so
it's a universal regression guard rather than dependent on a single
reproducer prompt.

Verified the check against the saved r01 artifacts: all pass (the
previous Krea run had one correctly-wired sink, not an orphan).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
@hthillman hthillman force-pushed the claude/laughing-beaver-78a34d branch from 8f80cf7 to 33db077 Compare April 21, 2026 17:49
hthillman and others added 2 commits April 21, 2026 13:10
…i_node reflow

Drawer was a full-height fixed overlay that obscured the graph; now it's a
flex sibling of StreamPage, so opening it resizes the canvas instead of
covering it. API-key / provider changes now dispatch a
`scope:agent-config-changed` window event that AgentContext listens on, so
the "no API key configured" banner clears without a reload.

SYSTEM_PROMPT tightened: propose_workflow is explicitly structural-only,
runtime-tweakable params must go through update_parameters (not a new
proposal), and the STYLE block bans meta-narration phrases the model was
leaking into chat ("Let me...", "Hmm...", "The field is X (labeled Y)...").
Added a LAYOUT section telling the agent to keep UI-state nodes LEFT of
x=0 so they don't collide with the frontend's top-level auto-layout strip.

As a safety net, propose_workflow now runs _reflow_ui_nodes after
validation: AABB-test every UI node against every other UI node and
against the predicted top-level column rectangles; if anything overlaps,
reassign all UI nodes into three deterministic columns at x=-320/-620/-920
by role (sliders/primitives/triggers/math closest in, prompt lists middle,
image/vace/lora/subgraph outermost). No-op when the agent's layout is
already clean. Verified on the known-bad complex-krea-prompt-switch-record
proposal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
…eak cases

Two new regression guards:

overlapping_nodes (forbid check): catches positioning bugs that render
as visually stacked nodes even when edges are correct. Mirrors the
reflow's bbox math — 240x140 for UI nodes (280 tall for
image/vace/subgraph), 200x60 for top-level nodes at x=50/350/650/950.
Added to all 8 existing cases and to the new layout-nodes-spaced case.

forbid_proposal (Case field): set true on cases where the agent must
NOT emit workflow_proposal because update_parameters is the right tool.
Runner treats presence of a proposal as a failure instead of the usual
"no proposal means fail" short-circuit.

Two new cases:
- layout-nodes-spaced: longlive + 2 sliders + prompt_list with triggers,
  exercising the many-UI-nodes-alongside-multiple-top-level-nodes
  surface where overlaps were observed.
- runtime-tweak-no-repropose: user asks to change noise_scale on a
  running graph; any workflow_proposal fails the case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant