Skip to content

feat: end-to-end cloud-connect test harness + Playwright-led skill#962

Open
emranemran wants to merge 7 commits intomainfrom
feat/test-cloud-connect-tooling
Open

feat: end-to-end cloud-connect test harness + Playwright-led skill#962
emranemran wants to merge 7 commits intomainfrom
feat/test-cloud-connect-tooling

Conversation

@emranemran
Copy link
Copy Markdown
Contributor

@emranemran emranemran commented Apr 20, 2026

Summary

Ships everything needed for a fresh contributor (or Claude Code agent) to test the deployed Livepeer cloud path end-to-end. Combines the original bash tooling (originally #962), the Playwright e2e updates (originally #970, now folded in), and discoverability / routing / deploy-orchestration work so "test cloud" reliably deploys the user's current code and runs the real UI flow.

Since Livepeer mode is the only supported cloud path going forward (the old direct / fal_app.py + CloudConnectionManager mode is being deprecated), this PR establishes the single cloud-testing entry point.

What "test cloud" now does

An agent matching the skill's trigger phrases ("test cloud", "verify cloud streaming", cloud-connect error, edits to livepeer_fal_app.py / livepeer_app.py, etc.) walks through:

  1. Ask the user which fal app name + env to deploy to (defaulted from .env.local if set).
  2. Derive SCOPE_CLOUD_APP_ID from the confirmed app+env following fal's URL convention (main → no suffix; other envs → --<env> suffix).
  3. Sanity-check SCOPE_CLOUD_API_KEY / SCOPE_USER_ID; stop + ask if missing.
  4. Free port 8000 so run-app.sh owns it.
  5. Deploy via SCOPE_FAL_APP_NAME=… SCOPE_FAL_ENV=… ./deploy-staging.sh. Abort on failure — no running Playwright against stale code.
  6. Start scope with the derived URL.
  7. Run Playwright (cd e2e && npx playwright test). Full round-trip through livepeer trickle to the fal runner and back.
  8. Report result — on success, every session-lifecycle Kafka event has fired and is ready to verify in ClickHouse.

Two test paths, one skill

  1. Playwright e2e (primary) — drives the real Perform-mode UI with a synthetic camera (--use-fake-device-for-media-stream), verifies the output video comes back from the cloud. Produces every session-lifecycle Kafka event: pipeline_load_start, pipeline_loaded, session_created, stream_started, stream_heartbeat, session_closed. ~2–8 min depending on cold/warm.
  2. test-cloud-connect.sh (secondary) — POSTs /api/v1/cloud/connect and polls status. Only verifies the wrapper-layer websocket_connected / websocket_disconnected pair. Bisect-friendly exit codes (0/1/2/3). Good for git bisect run or "did the fal container come up?".

Files

  • run-app.sh — launches daydream-scope in livepeer cloud mode, sources secrets from .env.local. (Fixed a bash-quoting bug that broke under word-splitting.)
  • deploy-staging.sh — now parametrized on SCOPE_FAL_APP_NAME / SCOPE_FAL_ENV / SCOPE_FAL_AUTH. Works from any cwd. Ships tracked so any contributor can run it.
  • test-cloud-connect.sh — HTTP-only orchestration script (push → CI wait → deploy-staging → connect → status poll).
  • .env.example — required/optional env vars grouped into client-side (SCOPE_CLOUD_APP_ID, SCOPE_CLOUD_API_KEY, SCOPE_USER_ID) and deploy-side (SCOPE_FAL_APP_NAME, SCOPE_FAL_ENV, SCOPE_FAL_AUTH).
  • e2e/playwright.config.ts — camera permission + fake-device launch args so headless Chromium completes browser↔local-scope WebRTC and actually delivers frames.
  • e2e/tests/cloud-streaming.spec.ts — new-UI selectors (Workflow/Perform toggle), Camera input, output-video assertion.
  • e2e/README.md — rewritten. Old version documented stale env vars that don't exist anymore; new one points at the skill for canonical setup.
  • .agents/skills/testing-livepeer-fal-deploy/SKILL.md — leads with Playwright, prescribes the ask-user → deploy → run flow above.
  • CLAUDE.md — new "Cloud testing — use this skill" section routes "test cloud" prompts to the skill. Legacy Local Cloud Testing and MCP Server Testing with Local Cloud Dev sections get deprecation markers pointing at the new path.

Verified

  • npx playwright test passes against scope-livepeer-emran with the passthrough pipeline (~2.8 min cold, ~17 s warm). Output video plays; heartbeats fire.
  • When combined with PR feat: bring livepeer runner Kafka events to parity with cloud-relay #969's runner-side changes (open, deployed to scope-livepeer-emran for verification but not merged), the full event set lands in ClickHouse correlated by user_id and connection_id = manifest_id.

Test plan

  • Fresh clone: .env.example documents every required env var.
  • Set .env.local values; ./deploy-staging.sh picks up SCOPE_FAL_APP_NAME + SCOPE_FAL_ENV.
  • ./run-app.sh starts scope on :8000.
  • cd e2e && npm install && npx playwright install chromium then npx playwright test → passes.
  • ./test-cloud-connect.sh --skip-push --skip-build-wait --skip-deploy exits 0 with CONNECTED.
  • Asking Claude Code "test cloud" in this repo routes to the skill and the agent walks the ask-user → deploy → Playwright flow.

Supersedes / related

🤖 Generated with Claude Code

Ship the scripts and skill we developed while debugging the livepeer
fal deploy path, so other contributors can run the same test loop
from their own Claude Code session.

- test-cloud-connect.sh orchestrates push → CI build-cloud wait →
  deploy-staging → local scope start → /cloud/connect → status poll
  with bisect-friendly exit codes (0/1/2/3/4). Supports --skip-push,
  --skip-build-wait, --skip-deploy, --full-session, --keep-scope.
- run-app.sh launches daydream-scope in livepeer cloud mode, sourcing
  secrets from .env.local (gitignored).
- .env.example documents the required SCOPE_CLOUD_APP_ID /
  SCOPE_CLOUD_API_KEY / SCOPE_USER_ID env vars plus optional
  LIVEPEER_DEBUG.
- .agents/skills/testing-livepeer-fal-deploy/SKILL.md teaches future
  agents when to use this loop, common failure signatures (ACCESS_DENIED,
  All orchestrators failed, did not receive ready message), and the
  known gap around /api/v1/session/start not being livepeer-compatible.

Users need to supply their own deploy-staging.sh (a thin wrapper around
`fal deploy src/scope/cloud/livepeer_fal_app.py --app <name> --auth
public --env main`); the test script errors out with a clear message
if it's missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 03352ee7-c313-4d36-a927-5152f28d7802

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/test-cloud-connect-tooling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-962--preview
WebSocket wss://fal.run/daydream/scope-pr-962--preview/ws
Commit 88be832

Livepeer Runner

App ID daydream/scope-livepeer-pr-962--preview
WebSocket wss://fal.run/daydream/scope-livepeer-pr-962--preview/ws
Auth private

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-962--preview/ws" uv run daydream-scope

emranemran and others added 3 commits April 20, 2026 15:56
The redesign in #886 replaced the streaming-first landing with a
Workflow/Perform toggle and removed the "Daydream Scope" heading the
test was polling for. The test has been dead since then.

Updates:
- Wait on the Perform-mode toggle appearing instead of the missing heading
- Explicitly switch to Perform mode before the cloud/pipeline/stream
  steps — default is now Workflow (graph mode) where those controls
  aren't rendered
- Find the cloud button by title attribute (covers all three states:
  "Connect to cloud", "Connecting to cloud...", "Cloud connected")
- Bump the cloud-connect timeout to 180s so fal cold-starts have room
- Verify frame flow by polling any <video> element rather than locating
  the old "Video Output" card wrapper
- Stop uses the start-stream-button toggle (PlayOverlay changes role
  when streaming) with a text-based fallback

Verified locally: full flow passes in ~3 minutes against a warm fal
deploy with the passthrough pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
The previous iteration of this test false-positively passed. It polled
any <video> for playback, which always finds the local input preview
playing even when the browser↔local-scope WebRTC never completes and
no frames ever reach the cloud. The result: ClickHouse saw only
websocket_connected / pipeline_loaded / websocket_disconnected —
nothing that requires a real round-trip through the livepeer runner.

Two fixes:

1. Feed the browser a synthetic camera via
   --use-fake-device-for-media-stream (plus the Camera input toggle in
   the UI). This lets getUserMedia() succeed and a real WebRTC peer
   connection between browser and local scope complete end to end,
   which triggers CloudTrack._start() → LivepeerClient.start_media()
   and the "start_stream" trickle control message the runner needs.

2. Assert on the video inside the "Video Output" card, not any
   <video>. That element only renders when a remoteStream is set,
   so waiting on its visibility and currentTime > 0 is a true
   round-trip signal. After frames start flowing, idle 15s so
   stream_heartbeat events (~every 10s on the runner side) have
   a chance to fire.

Verified locally: test passes in ~2.8 min against scope-livepeer-emran
with passthrough. Full event set lands in ClickHouse when paired with
the parity PR (#969).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
Rewrite the testing-livepeer-fal-deploy SKILL so the primary
recommended path is the Playwright test (folded in from #970 via
cherry-pick). It's the only path that exercises the full livepeer
trickle round-trip and produces every session-lifecycle Kafka event.
Keep test-cloud-connect.sh as a secondary, bash-only smoke test for
"did the fal container come up?" / bisecting cloud-connect
regressions.

Also fix run-app.sh: the previous form tried to inline-prefix env
vars via ${VAR:+VAR=$VAR} on a backslash-continued command, which
breaks under bash word-splitting ("SCOPE_CLOUD_API_KEY=sk_... command
not found"). Switch to `export` + `exec uv run`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
@emranemran emranemran changed the title feat: add end-to-end cloud-connect test harness and skill feat: end-to-end cloud-connect test harness + Playwright-led skill Apr 20, 2026
emranemran and others added 3 commits April 20, 2026 16:48
Three small changes so contributors and Claude Code agents actually
find the skill instead of reinventing the test flow.

- Broaden the skill's `description` with more trigger phrases so
  agents match on common prompts like "test the fal deploy", "run
  playwright", "verify kafka events", "diagnose fal", and the various
  observed error strings.
- Add a "Testing the deployed Livepeer fal path" section to
  CLAUDE.md (right above the MCP testing sections) pointing at the
  skill and distinguishing the Playwright e2e from the bash smoke
  test. CLAUDE.md is auto-loaded, so agents see this on startup.
- Rewrite `e2e/README.md` — the old version referenced stale env vars
  (`DAYDREAM_TEST_EMAIL` / `DAYDREAM_TEST_PASSWORD`, and an outdated
  `FAL_WS_URL`) and a flow that no longer matches. New README points
  at the skill for the canonical setup and gives a quick-reference
  block for the current `VITE_DAYDREAM_API_KEY` + `.env.local` flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
Livepeer cloud mode is the only supported cloud path going forward —
the old direct/cloud-relay mode (fal_app.py + CloudConnectionManager
+ SCOPE_CLOUD_MODE=direct) is being deprecated. So "test cloud" from
a user should no longer be ambiguous.

- Add "test cloud" as an explicit trigger in the skill's description.
- Rewrite the CLAUDE.md testing section to be the single cloud
  entry point with a clear routing directive: any "test cloud",
  "verify cloud streaming", or cloud-connect error → this skill.
- Mark the legacy "Local Cloud Testing" and "MCP Server Testing with
  Local Cloud Dev" sections as DEPRECATED so agents don't accidentally
  land on them, while keeping the content for anyone unblocking
  in-flight work on the old path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
"test cloud" should actually test the user's current working tree,
not whatever code happens to be deployed. Previously the skill
documented running Playwright directly against scope-livepeer-emran,
which could silently false-pass against a stale deploy.

Three changes make this work:

- Parametrize `deploy-staging.sh` on SCOPE_FAL_APP_NAME +
  SCOPE_FAL_ENV (+ optional SCOPE_FAL_AUTH), defaulted from
  .env.local. Also make it track its own HERE path so it works when
  called from any cwd.
- Document both vars in .env.example, grouped into client-side and
  deploy-side sections. SCOPE_CLOUD_APP_ID stays — but we now note
  that the skill derives it from the app+env the user confirms at
  test time (fal's URL convention: main has no suffix, other envs
  get --<env>).
- Rewrite the SKILL's "Running the Playwright test" section with an
  explicit flow: ask for app+env → sanity-check secrets → free port
  8000 → deploy → start scope with derived URL → run Playwright.
  This is what agents should follow when a user says "test cloud".

Also add `deploy-staging.sh` to the repo (it was previously an
untracked per-user file); needed so any contributor following the
skill can actually run the deploy step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
emranemran added a commit that referenced this pull request Apr 21, 2026
Squash of feat/test-cloud-connect-tooling (PR #962) onto this
branch so we can exercise the parity changes end-to-end via
Playwright + skill-driven "test cloud" flow.

This commit is a throwaway for verification — once the parity code
is signed off, revert this single commit before opening PR #969 for
review so the diff stays focused.

Squashed from:
- feat: add end-to-end cloud-connect test harness and skill
- fix(e2e): update cloud-streaming test for graph-mode UI redesign
- fix(e2e): actually exercise the livepeer trickle path
- feat: lead SKILL with Playwright + fix run-app.sh env var quoting
- docs: make the testing-livepeer-fal-deploy skill discoverable
- docs: route all "test cloud" prompts to the livepeer skill
- feat: skill asks for fal app+env, deploys, then runs Playwright

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
Copy link
Copy Markdown
Contributor

@j0sh j0sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! 🔥 🔥 Let's wait until #972 is merged then some of the info here can be updated since cloud relay mode won't exist anymore

hthillman added a commit that referenced this pull request Apr 28, 2026
Per Emran's blessing in chat, absorbing PR #962 ("end-to-end cloud-connect
test harness + Playwright-led skill") into this PR so the two systems ship
as one cohesive story instead of two PRs with overlapping concerns.

The two surfaces stay invokable separately, as Emran requested:

- `testing-livepeer-fal-deploy` skill — triggered by "test cloud", "verify
  cloud streaming", "run the e2e test", cloud-connect errors. Engineer-
  driven ad-hoc verification: ask user → deploy → run Playwright → report.
  Drives e2e/tests/cloud-streaming.spec.ts via npx playwright.
- product-tests/ — automated CI gating, every PR, scenarios + chaos +
  regression + multimodal. Drives pytest + the @Scenario harness.

Two different questions ("did my deploy work?" vs "is the product broken?")
get two different tools. CLAUDE.md routing makes the distinction explicit.

Files folded in (verbatim from PR #962, authored by emranemran):
- .agents/skills/testing-livepeer-fal-deploy/SKILL.md
- .env.example
- deploy-staging.sh
- run-app.sh
- test-cloud-connect.sh
- e2e/playwright.config.ts (camera permission + fake-device launch args)
- e2e/tests/cloud-streaming.spec.ts (Perform-mode + camera + output video)
- e2e/README.md (rewritten to point at the skill)

CLAUDE.md merged: adds Emran's "Cloud testing — use this skill" routing
section, with a note distinguishing his ad-hoc skill from the product-tests
CI gate. Deprecation markers on the legacy "Local Cloud Testing" section
preserved.

Closes #962 once this lands.

Co-Authored-By: Emran M <emranemran@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
hthillman added a commit that referenced this pull request Apr 28, 2026
Per Emran's blessing in chat, absorbing PR #962 ("end-to-end cloud-connect
test harness + Playwright-led skill") into this PR so the two systems ship
as one cohesive story instead of two PRs with overlapping concerns.

The two surfaces stay invokable separately, as Emran requested:

- `testing-livepeer-fal-deploy` skill — triggered by "test cloud", "verify
  cloud streaming", "run the e2e test", cloud-connect errors. Engineer-
  driven ad-hoc verification: ask user → deploy → run Playwright → report.
  Drives e2e/tests/cloud-streaming.spec.ts via npx playwright.
- product-tests/ — automated CI gating, every PR, scenarios + chaos +
  regression + multimodal. Drives pytest + the @Scenario harness.

Two different questions ("did my deploy work?" vs "is the product broken?")
get two different tools. CLAUDE.md routing makes the distinction explicit.

Files folded in (verbatim from PR #962, authored by emranemran):
- .agents/skills/testing-livepeer-fal-deploy/SKILL.md
- .env.example
- deploy-staging.sh
- run-app.sh
- test-cloud-connect.sh
- e2e/playwright.config.ts (camera permission + fake-device launch args)
- e2e/tests/cloud-streaming.spec.ts (Perform-mode + camera + output video)
- e2e/README.md (rewritten to point at the skill)

CLAUDE.md merged: adds Emran's "Cloud testing — use this skill" routing
section, with a note distinguishing his ad-hoc skill from the product-tests
CI gate. Deprecation markers on the legacy "Local Cloud Testing" section
preserved.

Closes #962 once this lands.

Co-Authored-By: Emran M <emranemran@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
hthillman added a commit that referenced this pull request Apr 28, 2026
The CLAUDE.md "Cloud testing — use this skill" section that should
have landed in 8fe40ed didn't get staged before the commit. Adding
it now: routes "test cloud" / "verify cloud streaming" / cloud-connect
errors to the testing-livepeer-fal-deploy skill, with a note
distinguishing it from the product-tests CI gate. Deprecation markers
on legacy "Local Cloud Testing" section preserved.

Co-Authored-By: Emran M <emranemran@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Hunter Hillman <hthillman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants