Skip to content

feat(ci): add nightly canary and stable release workflows#1525

Closed
ashish921998 wants to merge 9 commits into
mainfrom
session/ao-152
Closed

feat(ci): add nightly canary and stable release workflows#1525
ashish921998 wants to merge 9 commits into
mainfrom
session/ao-152

Conversation

@ashish921998
Copy link
Copy Markdown
Collaborator

Summary

  • packages/web/package.json: add "private": true so changeset publish never tries to publish the Next.js dashboard to npm
  • .changeset/config.json: remove @aoagents/ao-web from the linked group, add it to ignore alongside ao-integration-tests, and add a snapshot.prereleaseTemplate for deterministic 0.x.y-nightly-<sha> version strings
  • .github/workflows/release.yml: stable @latest publishing via changesets/action@v1 — opens/updates a "Version Packages" PR when changesets are pending; publishes on merge
  • .github/workflows/canary.yml: triggered by workflow_run on CI success on main; includes stale-run guard (skips if SHA isn't current tip of main) and snapshot gap fix (creates a temp changeset when none exist so the snapshot step always has something to version); posts the install command as a comment on the merged PR
  • README.md: canary install callout after the stable npm install line
  • CONTRIBUTING.md: "Testing your changes" section after Development Setup

Test plan

  • Verify pnpm changeset publish skips @aoagents/ao-web (private package)
  • Merge a change to main with no pending changesets — canary workflow should auto-create temp changeset and publish @nightly
  • Merge a "Version Packages" PR — release workflow should publish @latest
  • Verify canary bot comments on merged PRs with the exact install command
  • Run npm install -g @aoagents/ao@nightly after a canary publish to confirm it works

Closes #152

🤖 Generated with Claude Code

- Add private:true to ao-web package.json so changeset publish skips it
- Update .changeset/config.json: move ao-web to ignore, add snapshot
  prereleaseTemplate for deterministic nightly version strings
- Add release.yml: changesets/action@v1 handles stable @latest publishing
  via "Version Packages" PR cycle
- Add canary.yml: publishes @nightly on every CI-green main push with
  stale-run guard and snapshot gap fix (creates temp changeset when none exist)
- Add canary install note to README.md
- Add "Testing your changes" section to CONTRIBUTING.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Test Coverage Report

No TypeScript source files changed in this PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR wires up nightly canary and stable release CI pipelines using Changesets, marks @aoagents/ao-web as private so it is never accidentally published, and adds developer docs for testing canary builds.

  • P1 (canary.yml lines 98–104): The gh api comment body inherits YAML block-scalar indentation, placing 10 spaces before the ``` fence markers. CommonMark only recognises a fenced code block with 0–3 spaces of leading indentation, so the code block will render as literal text in the PR comment.

Confidence Score: 4/5

Safe to merge after fixing the comment-body indentation so the code fence renders correctly in PR comments.

One P1 finding: the YAML block-scalar indentation in the gh api call will cause the fenced code block in canary PR comments to render as plain text. The rest of the changes (release workflow, changeset config, private flag, docs) are correct and well-structured.

.github/workflows/canary.yml — comment body indentation (lines 98–104)

Important Files Changed

Filename Overview
.github/workflows/canary.yml New nightly canary workflow; stale-run guard and snapshot gap fix are sound, but the PR comment body has indentation that will break the fenced code block rendering (P1), and there is a minor dead short_sha output.
.github/workflows/release.yml Standard changesets/action@v1 stable release workflow; permissions, fetch-depth, and build filter look correct.
.changeset/config.json Adds snapshot prereleaseTemplate, removes @aoagents/ao-web from linked group, and adds it to ignore list; all changes are consistent with the private-package goal.
packages/web/package.json Adds "private": true to prevent accidental npm publishing of the Next.js dashboard.
CONTRIBUTING.md Adds canary testing instructions; example version is illustrative and acceptable.
README.md Adds canary install callout after the stable install line; straightforward documentation change.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub (main push)
    participant CI as CI Workflow
    participant Canary as Canary Workflow
    participant Release as Release Workflow
    participant NPM as npm Registry
    participant PR as Merged PR Comment

    Dev->>GH: Push / merge PR to main
    GH->>CI: Trigger CI (lint, typecheck, test)
    GH->>Release: Trigger release.yml (push to main)
    Release->>Release: pnpm install + build
    Release->>Release: changesets/action@v1
    alt Changesets pending
        Release->>GH: Open/update Version Packages PR
    else Version Packages PR merged
        Release->>NPM: pnpm changeset publish at latest
    end

    CI-->>Canary: workflow_run completed (success)
    Canary->>GH: Check if SHA == tip of main
    alt SHA is stale
        Canary->>Canary: Skip (exit early)
    else SHA is current
        Canary->>Canary: pnpm install + build
        alt No changesets exist
            Canary->>Canary: Create canary-temp.md
        end
        Canary->>Canary: pnpm changeset version --snapshot nightly
        Canary->>NPM: pnpm changeset publish --tag nightly
        Canary->>PR: Comment with install command
    end
Loading

Reviews (1): Last reviewed commit: "feat(ci): add nightly canary and stable ..." | Re-trigger Greptile

Comment thread .github/workflows/canary.yml
Comment thread .github/workflows/canary.yml Outdated
Comment thread .github/workflows/canary.yml
- Remove @aoagents/ao-web from ignore list (ao-cli depends on it, causing
  ValidationError on all changeset commands); private:true in package.json
  is sufficient to prevent publishing
- Use case-insensitive grep -vi for README filter in snapshot gap check
- Remove dead short_sha step output from publish step
- Use heredoc for PR comment body so backtick fences render correctly
  (shell indentation was breaking CommonMark code block detection)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@ashish921998 ashish921998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push. Three issues addressed:

  1. [P1] Code fence indentation — switched to a heredoc so the backtick lines sit at column 0 and render as a proper code block in GitHub markdown.
  2. [P2] Dead short_sha output — removed the two lines.
  3. [P2] Case-sensitive README grep — changed to grep -vi so readme.md is also filtered out.

Also fixed a separate Codex-found [P1]: removed @aoagents/ao-web from the changeset ignore list — @aoagents/ao-cli depends on it, which caused a ValidationError on every changeset command. The private: true flag in packages/web/package.json is sufficient to prevent publishing.

ashish921998 and others added 2 commits April 27, 2026 19:40
Prevents posting a canary install comment if the publish step fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- canary comment step now has continue-on-error:true so a failed gh api
  call (rate limit, permissions) doesn't mark a successful publish as failed
- document NPM_TOKEN secret requirement in CONTRIBUTING.md so maintainers
  know what to configure before the workflows can publish

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@i-trytoohard
Copy link
Copy Markdown
Collaborator

PR Review: Canary + Stable Release Workflows

Overall this is a solid PR — the architecture is correct (changesets for stable, snapshots for canary, private flag for web). The Greptile findings were already addressed in commits 2-4. A few things to consider:


🔴 Worth Addressing

1. release.yml and canary.yml both trigger on every push to main

release.yml triggers directly on push; canary.yml triggers via workflow_run on CI completing. Both do pnpm -r --filter '!@aoagents/ao-web' build redundantly. If no changesets exist, release is a no-op — fine. But consider adding a path filter to release.yml so it only runs when .changeset/ files change, or at minimum document the interaction. Not a blocker, but wastes CI minutes.

2. id-token: write permission declared but unused

Both workflows declare id-token: write but neither uses OIDC federation. Remove it — least-privilege.


🟡 Suggestions

3. CANARY_VERSION extraction is hardcoded

CANARY_VERSION=$(node -p "require('./packages/ao/package.json').version")

This hardcodes the path. If the package is ever renamed/restructured, it silently breaks. Consider pnpm changeset status --output=json or at least add a comment noting the dependency.

4. PR lookup for commenting may miss squash-merged PRs

workflow_run.head_sha is the merge commit on main, not the PR head SHA. The commits/{sha}/pulls API should map merge commits to their PRs, but worth verifying with a real squash-merge. If it doesn't work, you'd need to parse the commit message for (#NNN).

5. HUSKY=0 set after pnpm install

If the install script ever triggers a husky hook (unlikely in CI but possible), it would fail. Set it before install:

- run: echo "HUSKY=0" >> $GITHUB_ENV
- run: pnpm install --frozen-lockfile

📝 Nits

  • CONTRIBUTING.md example version 0.2.5-nightly-abc1234 will go stale. Consider x.y.z-nightly-<sha> as placeholder.
  • No cleanup of canary-temp.md — fine in CI (fresh checkout), but worth a comment.

Verdict

Approve with nits. The id-token removal and HUSKY ordering are trivial fixes. The release/canary redundancy and path filter can be a follow-up. Ship it.

- Remove id-token:write from both workflows (unused — no OIDC/provenance)
- Move HUSKY=0 before pnpm install in both workflows so install hooks
  can't fire before the env var is set
- Add comments on canary-temp.md lifecycle and CANARY_VERSION path
- Update stale version placeholder in CONTRIBUTING.md to x.y.z-nightly-<sha>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ashish921998
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review. Addressed in the latest commit:

Fixed:

  • Removed id-token: write from both workflows — you're right, neither uses OIDC/provenance
  • Moved HUSKY=0 before pnpm install in both workflows
  • Added comment on the hardcoded CANARY_VERSION path noting the dependency
  • Updated the CONTRIBUTING.md version placeholder to x.y.z-nightly-<sha>
  • Added a comment clarifying that canary-temp.md is consumed and deleted by changeset version --snapshot — no explicit cleanup needed

On path filter for release.yml: Left it triggering on every push intentionally. changesets/action needs to run on every merge to keep the "Version Packages" PR up to date — if a push doesn't add a new changeset but the PR already exists, the action updates it with the latest commit. Filtering to .changeset/ would break that.

On squash-merge PR lookup: The commits/{sha}/pulls API maps merge commits to their associated PRs and does work for squash merges in practice. It's a known edge case if a commit lands via a method GitHub doesn't index (e.g. direct push with no PR), but in that case the comment step just silently no-ops — publish still succeeds.

Comment thread .github/workflows/canary.yml
Comment thread packages/web/package.json
Comment thread .changeset/config.json Outdated
@ashish921998
Copy link
Copy Markdown
Collaborator Author

Here are the consolidated comments, ready to paste on PR #1525. Grouped by severity, each with file:line.


🔴 P1 — Blocking

1. @aoagents/ao-cli depends on private @aoagents/ao-web — published installs will break

File: packages/cli/package.json:57 + packages/web/package.json:5

packages/web was just marked "private": true, but packages/cli/package.json:57 still declares "@aoagents/ao-web": "workspace:*" as a runtime dependency. Both workflows also skip building web (pnpm -r --filter '!@aoagents/ao-web' build). When @aoagents/ao-cli publishes to npm, the ao-web dependency can't be resolved — it's private and never published. End users running npm install -g @aoagents/ao will hit a missing dependency or get a stale/broken dashboard.

Fix options:

  • Move @aoagents/ao-web to devDependencies and load it dynamically at runtime, or
  • Bundle the built dashboard (.next/, dist-server/) into the @aoagents/ao-cli package's files and drop the workspace dep, or
  • Don't make ao-web private — publish it alongside.

2. @aoagents/ao-web removed from linked but not added to ignore

File: .changeset/config.json:24-37

The PR description says: "add it to ignore alongside ao-integration-tests" — but the actual ignore array only contains @aoagents/ao-integration-tests. ao-web is now neither linked nor ignored. changeset version may still try to bump it when a referenced changeset exists. Either add it to ignore, or confirm private: true alone is sufficient and update the PR description.


🟠 P2 — Should fix before merging

3. Temp changeset only bumps @aoagents/ao — misses 6 publishable packages

File: .github/workflows/canary.yml:69-70

printf -- '---\n"@aoagents/ao": patch\n---\n\nchore: canary build\n' > .changeset/canary-temp.md

@aoagents/ao is in the linked group, so all linked packages bump together. But these publishable packages are outside the linked group and won't get a nightly version on the no-changesets path:

  • composio-ao-plugin
  • @aoagents/ao-plugin-agent-cursor
  • @aoagents/ao-plugin-notifier-discord
  • @aoagents/ao-plugin-notifier-openclaw
  • @aoagents/ao-plugin-scm-gitlab
  • @aoagents/ao-plugin-tracker-gitlab

Users running npm install <pkg>@nightly for any of these will get a stale version. Fix: either add them to the linked group, or write a wildcard changeset that bumps all publishable workspaces.


4. release.yml runs on every push to main with no dependency on CI passing

File: .github/workflows/release.yml:3-5

canary.yml correctly gates on workflow_run: [CI] + conclusion == 'success'. release.yml triggers directly on push: branches: [main] with no such gate. In practice changesets/action@v1 only publishes when a Version Packages PR is merged, but a broken build can still slip through. Mirror the canary pattern: workflow_run on CI success, or add a gating step that runs pnpm test / pnpm typecheck before publish.


5. Canary only waits for CI workflow — ignores Integration Tests and Security

File: .github/workflows/canary.yml:4-7

workflow_run: workflows: [CI] only listens to one workflow. If you have separate Integration Tests or Security workflows running on main, canaries can publish while those are failing. Either consolidate into the CI workflow, or add the others to the workflow_run.workflows list.


6. TOCTOU race in SHA-tip-of-main check

File: .github/workflows/canary.yml:30-38

Between the SHA check and the publish step, a new commit can land on main. The publish would then go out for an older (no-longer-tip) SHA. Combined with concurrency: cancel-in-progress: false, two canaries can serialize and publish in either order. Acceptable risk for nightly cadence, but worth either re-checking inside the publish step or relying on the nightly npm tag as the authoritative pointer.


🟡 P3 — Polish

7. Hardcoded ./packages/ao/package.json path

File: .github/workflows/canary.yml:82

CANARY_VERSION=$(node -p "require('./packages/ao/package.json').version")

If the package directory is ever renamed, this silently breaks (or picks the wrong version). Use module resolution instead: node -p "require('@aoagents/ao/package.json').version" — works through workspace symlinks regardless of directory name.


8. Inconsistent commit message styling between PR description and config

File: .github/workflows/release.yml:36-37

PR description says "Version Packages", the config uses "chore: version packages". Pick one — both work, but the inconsistency is a small confusion source for reviewers.


That's the full set. #1 is the actual blocker — the published-cli → private-web dependency will surface as broken npm install for end users the moment release.yml ships.

ashish921998 and others added 2 commits April 28, 2026 22:13
- add missing opening --- delimiter to two changeset frontmatter files
  (.changeset/fix-start-restore-and-codex-resume.md and
  .changeset/fixed-orchestrator-identity.md) — changeset version would
  have rejected them on the first canary/release run
- set useCalculatedVersionForSnapshots: true so snapshot versions are
  x.y.z-nightly-<sha> as documented, not 0.0.0-nightly-<sha>
- add bundledDependencies: ["@aoagents/ao-web"] to ao-cli and build
  ao-web in both canary/release workflows — ensures the dashboard is
  always shipped at the correct version with the CLI instead of drifting
  toward a stale npm-published ao-web

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add 5 publishable packages to linked group (ao-plugin-agent-cursor,
  ao-plugin-notifier-discord, ao-plugin-notifier-openclaw,
  ao-plugin-scm-gitlab, ao-plugin-tracker-gitlab) — temp changeset was
  missing them so nightly would have published stale versions for those
- gate release.yml on workflow_run CI success instead of bare push —
  mirrors the canary pattern so broken builds can't slip into stable
- add Integration Tests to canary workflow_run trigger so canary
  doesn't publish while integration tests are failing
- use module resolution for CANARY_VERSION
  (require('@aoagents/ao/package.json')) instead of hardcoded path
- add comment in release.yml clarifying "chore: version packages" is
  the PR title changesets creates (addresses title inconsistency note)
- ao-web is intentionally not in ignore: private:true is the canonical
  mechanism; ignore caused a ValidationError since ao-cli depends on it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@i-trytoohard i-trytoohard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(ci): add nightly canary and stable release workflows

Verdict: Solid — approved with one concern to address or document.

✅ What landed well

  • Canary workflow is well-structured: stale-SHA guard prevents queued runs from publishing old builds, temp changeset creation when none exist is the right workaround for the "Version Packages merge leaves no changesets" gap, and continue-on-error on the PR comment is sensible.
  • bundledDependencies for ao-web — the right fix for the private-publish drift problem. ao-web gets inlined into the CLI tarball, so end users always get the matching dashboard. Nice catch addressing illegalcall's concern.
  • Changeset frontmatter fix — adding the opening --- to the two existing files prevents the first canary run from exploding. Critical fix.
  • useCalculatedVersionForSnapshots: true — correct. Without it, snapshots would be 0.0.0-nightly-<sha> instead of the documented x.y.z-nightly-<sha>.
  • New plugins added to linked group — ao-plugin-agent-cursor, notifier-discord, notifier-openclaw, scm-gitlab, tracker-gitlab. Good catch that the temp changeset would have missed these.
  • Release workflow gated on CI — not just bare push, mirrors the canary pattern. Smart.
  • Integration Tests in canary trigger — prevents canary from publishing while integration tests are failing.

⚠️ One concern: race between release.yml and canary.yml

Both workflows trigger on workflow_run from CI completing on main. If a Version Packages merge triggers both, canary and release could race on pnpm changeset version. The concurrency groups are separate (canary vs release), so they won't cancel each other.

Recommendation: Either document the expected ordering (e.g. "release always runs first because changesets/action consumes pending changesets, canary's temp-changeset path handles the no-changeset case"), or add a job-level dependency / status check so canary waits for release to complete. This isn't a merge blocker since the temp-changeset path handles the overlap gracefully — but worth a comment in the workflow files.

⚠️ Minor (non-blocking)

  • bundledDependencies is deprecated in npm docs in favor of bundleDependencies (no 'd'). npm still supports both, but worth noting for future-proofing.
  • Tarball size — Next.js builds can be chunky. Worth checking npm pack --dry-run to see the CLI tarball size post-bundle.
  • GITHUB_TOKEN permissionscontents: write is set in release.yml, but the default GITHUB_TOKEN in public repos is often read-only. Make sure the repo settings allow the token write access for creating the Version Packages PR.

Overall: illegalcall's concerns are all addressed. The core approach is sound. Approved pending the race condition documentation.

@ashish921998
Copy link
Copy Markdown
Collaborator Author

ashish921998 commented May 1, 2026

npm rolled out OIDC trusted publishing in 2025, basically no token stored anywhere, github vouches for the workflow in real time. solves your secrets concern better than the VPS cron would and no infra to babysit.
as discussed with @AgentWrapper

ashish921998 and others added 2 commits May 4, 2026 20:16
# Conflicts:
#	.changeset/fix-start-restore-and-codex-resume.md
#	.changeset/fixed-orchestrator-identity.md
…ig to all packages

- Gate release and canary workflows behind `environment: release` so only
  main-branch runs can publish to npm
- Add `id-token: write` + `NPM_CONFIG_PROVENANCE=true` to both workflows so
  every published package carries a signed SBOM link back to the workflow run
- Add `publishConfig: { access: "public", provenance: true }` to all 26
  public packages (scoped packages default to private without this)

To complete the setup: create a `release` environment in GitHub Settings,
restrict it to branch `main`, and move NPM_TOKEN there as an environment secret.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ashish921998
Copy link
Copy Markdown
Collaborator Author

Step 1: New npm token (npmjs.com)
Go to npmjs.com → click your avatar → Access Tokens
Click Generate New Token → choose Granular Access Token
Fill in:
Token name: agent-orchestrator-publish
Expiration: 90 days
Packages and scopes: Select packages → search @aoagents → select all of them → Permission: Read and write
Everything else: leave default
Click Generate Token — copy it immediately, you only see it once
Go back to the token list, find the old Automation or Publish token and delete it
Step 2: GitHub environment (github.com)
Go to your repo → Settings → Environments (left sidebar) → New environment
Name it exactly release (must match what's in the workflow files)
On the environment page:
Under Deployment branches and tags → click Add deployment branch or tag rule → choose Branch → type main
Under Environment secrets → click Add secret → Name: NPM_TOKEN, Value: paste the token from Step 1
Save
What happens after you merge
Every push to main that passes CI triggers canary.yml. When it reaches the publish step, GitHub checks: "is this job running from the release environment? Is the branch main?" — yes and yes, so it injects the NPM_TOKEN secret and the job runs.

If someone pushed from a branch that isn't main, or from a fork, GitHub refuses to inject the secret and the job fails before it can publish anything.

The NPM_TOKEN itself is now scoped to @aoagents/* only — even if it somehow leaked, it can't touch any other packages on npm.

Then go merge the PR.

suraj-markup added a commit that referenced this pull request May 10, 2026
… banner, cron

Implements the full release pipeline described in release-process.html
(supersedes #1525, which only had the workflow scaffolding).

A. Release infrastructure — .github/workflows/canary.yml triggers on a cron
   ('30 17 * * 5,6,0,1,2', i.e. 23:00 IST Fri–Tue) plus workflow_dispatch,
   without the stale-SHA guard or the merged-PR-comment step from #1525
   (cron has no merged-PR context). release.yml uses changesets/action.
   .changeset/config.json adds the snapshot template and moves the private
   @aoagents/ao-web to ignore[].

B. Channel awareness (packages/cli/src/lib/update-check.ts) — new
   updateChannel field in the global-config Zod schema (stable | nightly
   | manual; defaults to manual so existing users see no surprise installs).
   fetchLatestVersion now reads dist-tags[channel] from the registry;
   isVersionOutdated compares prerelease segments numerically + lexically
   so SHA-suffixed nightlies sort correctly. maybeShowUpdateNotice and
   scheduleBackgroundRefresh skip entirely on manual.

C. Active-session guard (packages/cli/src/commands/update.ts) — before
   any handle*Update proceeds, sm.list() filters for working/idle/
   needs_input/stuck and refuses with `N session(s) active. Run
   `ao stop` first.` instead of auto-stopping (per the design doc:
   surprise-killing user work is worse than refusing).

D. Soft auto-install + onboarding — handleNpmUpdate skips the confirm
   prompt on stable/nightly. New packages/cli/src/lib/update-channel-
   onboarding.ts prompts once on the first `ao start` after this lands;
   ask-once gate keyed on the absence of updateChannel in the global
   config; dismissal persists `manual`. New `ao config set updateChannel
   <value>` command (also handles installMethod).

E. Dashboard banner — packages/web/src/app/api/version/route.ts reads
   the same cache file the CLI writes (~/.cache/ao/update-check.json,
   XDG-aware) and rejects cache entries from a different channel.
   packages/web/src/app/api/update/route.ts duplicates the active-session
   guard so the dashboard can return a structured 409. New UpdateBanner
   component wired into Dashboard.tsx — Tailwind only, var(--color-*)
   tokens, dismissible per-version via localStorage, deferred fetch so
   it doesn't shift the call order in existing dashboard tests.

F. Bun + Homebrew detection (update-check.ts) — new classifiers for
   ~/.bun/install/global/ (auto-installs `bun add -g @aoagents/ao@<channel>`)
   and /Cellar/ao/ (notice-only — `brew upgrade ao`, never auto-install
   because brew owns the symlinks). New installMethod override field in
   the global config to pin detection when path heuristics fail.

Tests: +155 (B/C/F unit, onboarding ask-once gate, /api/version + /api/update,
UpdateBanner visibility/dismiss/click). pnpm test, pnpm typecheck, pnpm lint
all green for the changes; the same 10 pre-existing test failures observed
on main are still present (all environment-dependent: ~/.cache/ao state, codex
binary install, /private path canonicalization on macOS).

Closes #1525

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yyovil pushed a commit to yyovil/agent-orchestrator that referenced this pull request May 12, 2026
… banner, cron (ComposioHQ#1781)

* feat(release): weekly release train — channels, onboarding, dashboard banner, cron

Implements the full release pipeline described in release-process.html
(supersedes ComposioHQ#1525, which only had the workflow scaffolding).

A. Release infrastructure — .github/workflows/canary.yml triggers on a cron
   ('30 17 * * 5,6,0,1,2', i.e. 23:00 IST Fri–Tue) plus workflow_dispatch,
   without the stale-SHA guard or the merged-PR-comment step from ComposioHQ#1525
   (cron has no merged-PR context). release.yml uses changesets/action.
   .changeset/config.json adds the snapshot template and moves the private
   @aoagents/ao-web to ignore[].

B. Channel awareness (packages/cli/src/lib/update-check.ts) — new
   updateChannel field in the global-config Zod schema (stable | nightly
   | manual; defaults to manual so existing users see no surprise installs).
   fetchLatestVersion now reads dist-tags[channel] from the registry;
   isVersionOutdated compares prerelease segments numerically + lexically
   so SHA-suffixed nightlies sort correctly. maybeShowUpdateNotice and
   scheduleBackgroundRefresh skip entirely on manual.

C. Active-session guard (packages/cli/src/commands/update.ts) — before
   any handle*Update proceeds, sm.list() filters for working/idle/
   needs_input/stuck and refuses with `N session(s) active. Run
   `ao stop` first.` instead of auto-stopping (per the design doc:
   surprise-killing user work is worse than refusing).

D. Soft auto-install + onboarding — handleNpmUpdate skips the confirm
   prompt on stable/nightly. New packages/cli/src/lib/update-channel-
   onboarding.ts prompts once on the first `ao start` after this lands;
   ask-once gate keyed on the absence of updateChannel in the global
   config; dismissal persists `manual`. New `ao config set updateChannel
   <value>` command (also handles installMethod).

E. Dashboard banner — packages/web/src/app/api/version/route.ts reads
   the same cache file the CLI writes (~/.cache/ao/update-check.json,
   XDG-aware) and rejects cache entries from a different channel.
   packages/web/src/app/api/update/route.ts duplicates the active-session
   guard so the dashboard can return a structured 409. New UpdateBanner
   component wired into Dashboard.tsx — Tailwind only, var(--color-*)
   tokens, dismissible per-version via localStorage, deferred fetch so
   it doesn't shift the call order in existing dashboard tests.

F. Bun + Homebrew detection (update-check.ts) — new classifiers for
   ~/.bun/install/global/ (auto-installs `bun add -g @aoagents/ao@<channel>`)
   and /Cellar/ao/ (notice-only — `brew upgrade ao`, never auto-install
   because brew owns the symlinks). New installMethod override field in
   the global config to pin detection when path heuristics fail.

Tests: +155 (B/C/F unit, onboarding ask-once gate, /api/version + /api/update,
UpdateBanner visibility/dismiss/click). pnpm test, pnpm typecheck, pnpm lint
all green for the changes; the same 10 pre-existing test failures observed
on main are still present (all environment-dependent: ~/.cache/ao state, codex
binary install, /private path canonicalization on macOS).

Closes ComposioHQ#1525

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): CI failures + Greptile review feedback

CI fixes:
- Web /api/update spawn ENOENT — attach `child.on("error", ...)` so the
  asynchronous spawn-error event from a missing `ao` binary doesn't bubble
  up as an unhandled error and crash vitest. The route already returns 202
  before the error fires; on real installs the user sees "no version change"
  if the install fails.
- start.test.ts pollution — runStartup calls `maybePromptForUpdateChannel`,
  which (with isHumanCaller mocked to true) writes to the real
  ~/.agent-orchestrator/config.yaml on the CI runner via persistUpdateChannel.
  Subsequent tests then load that newly-created (empty-projects) config and
  report "No projects configured" instead of the expected "project not found".
  Fix: stub `update-channel-onboarding.js` in start.test.ts so runStartup
  is a no-op for the channel prompt.

Review feedback:
- (P1) `runtime: "tmux"` hardcoded default in `persistUpdateChannel` and
  `loadOrInit` would lock Windows users into a non-functional tmux config
  when they dismiss the channel prompt. Both now use `getDefaultRuntime()`,
  matching `makeEmptyGlobalConfig` in core's global-config.ts.
- (P2) `hasChosenUpdateChannel` JSDoc inverted — the second "True when"
  bullet actually described the False case. Rewritten with separate
  True/False sections that match the implementation.
- (P2) `isVersionOutdated` was duplicated between the CLI and the dashboard
  /api/version route. Moved to a new shared module
  `packages/core/src/version-compare.ts`, exported from `@aoagents/ao-core`,
  consumed by both CLI (re-exports as `isVersionOutdated`) and the web route
  directly. Added 14 unit tests in core for the canonical implementation.

Defensive: `maybePromptForUpdateChannel` now validates the prompt result via
`UpdateChannelSchema.safeParse` before persisting — never writes `undefined`
or an unrecognized string to disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): Windows spawn + dismiss-while-blocked review feedback

- (P1) `ao update` silently never ran on Windows because `spawn("ao", ...)`
  doesn't consult PATHEXT, so npm's `ao.cmd` shim wasn't found and the
  async ENOENT was swallowed by the error handler. Add `shell: isWindows()`
  + `windowsHide: true` per the cross-platform guide.
- (P1) Dismiss button was inert when the banner was in the `blocked` (409)
  or `error` phase — `setDismissedFor` set the localStorage flag but the
  hide condition required `phase === "idle"`, so the banner stayed pinned
  until reload. `handleDismiss` now resets phase to idle (and clears the
  error message) so the existing condition fires. Added a regression test
  covering dismiss from the 409 path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): runNpmInstall on Windows — shell:true so PATHEXT resolves npm.cmd

(P1) The dashboard /api/update spawn got `shell: isWindows()` + `windowsHide:
true` in 9f29131, but `runNpmInstall` in the CLI's `ao update` command was
still missing the same fix. On Windows, `spawn("npm", ...)` without a shell
wrapper doesn't consult PATHEXT, so npm/pnpm/bun's `*.cmd` shims never
resolve and the install silently ENOENTs.

Mirror the fix into runNpmInstall — it's the single spawn site behind every
non-git, non-homebrew install path (npm-global, pnpm-global, bun-global,
unknown), so this one change covers all four install methods.

Tests:
- Mock `isWindows` from @aoagents/ao-core so the spawn options can be
  inspected per-platform.
- Assert `shell: true, windowsHide: true, stdio: "inherit"` on Windows.
- Assert `shell: false` on macOS / Linux.
- Parametrize over pnpm-global / bun-global to confirm the same options flow
  through every npm-style install command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): /api/version reads cached.isOutdated for git installs

(P1) The dashboard banner never appeared for git-installed users because
`/api/version` ran `isVersionOutdated(current, "origin/main")`, and
`parseVersion("origin/main")` produces NaN parts that the early-exit guard
catches with `return false`. Git installs cache `latestVersion` as a git
ref (not a semver) and a precomputed `isOutdated` flag from `git fetch +
merge-base`; the CLI special-cases this in `update-check.ts`. Mirror the
same pattern here:

  cached.installMethod === "git"
    ? cached.isOutdated === true
    : isVersionOutdated(current, latest)

Also extend the local CacheData with `installMethod?: string` and
`isOutdated?: boolean` so the new branch type-checks. Kept as `string`
rather than importing the CLI's `InstallMethod` type — the literal "git"
compare is the only thing that matters here, and the web package shouldn't
take a dep on @aoagents/ao-cli.

Two new tests cover the git-install path: one asserts isOutdated=true is
trusted from the cache, the other asserts isOutdated=false (current with
origin) is trusted too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): must-fix #3+#4 — global-config layout + git-only flag guard

#3 — ensureNoActiveSessions now consults loadGlobalConfig() first as a quick
"any projects registered?" check, then routes through loadConfig(globalPath)
only when the registry actually has projects (loadConfig dispatches to
buildEffectiveConfigFromGlobalConfigPath when given the canonical global path
— see packages/core/src/config.ts). Defends against AO_GLOBAL_CONFIG override
to a non-canonical path. Three new tests cover: registered-projects path
fires the guard correctly; empty registry returns early without building a
SessionManager; missing global file returns early without even reading it.

#4 — Restored the rejection of git-only flags on non-git installs. Users
copy/pasting `ao update --skip-smoke` from older docs would silently no-op
on npm/pnpm/bun installs. Now exits non-zero with:
"--skip-smoke only applies to git installs (current install: npm-global)."
Test it.each across npm/pnpm/bun/homebrew/unknown plus a positive test that
git installs still accept the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): #2 should-fix — channel-switch prompt

When a stable user runs `ao config set updateChannel nightly` and then
`ao update`, isVersionOutdated(0.5.0, 0.5.0-nightly-abc) returns false (per
semver, prerelease < stable on equal base). The old code printed "Already
on latest nightly" and exited without installing — confusing, because the
install command we'd run is genuinely a different dist-tag.

Fix: snapshot the previously-cached channel BEFORE forcing a refresh, then
detect a switch via `previousChannel !== activeChannel && !info.isOutdated`.
On switch:
  - Don't take the "already on latest" early-return.
  - Print a yellow "Channel switch detected: was X, now Y." notice.
  - Force a confirm prompt regardless of stable/nightly soft-install,
    defaulting to "no" (channel-switch should be explicit). Manual users
    still see their normal prompt.

Onboarding copy now includes one line about channel switches: "switching
later prompts before installing the other channel's build."

4 new tests: explicit switch fires the prompt + installs on yes; declines
on no; same-channel doesn't fire (back to "Already on latest"); first-ever
update with no previous cache doesn't fire either.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(release-train): polish — drop setTimeout, dedup defaults, share cache, dedup export

ComposioHQ#5 — UpdateBanner no longer wraps its mount fetch in setTimeout(0). Production
code shouldn't bend to test mock ordering. Instead, the two brittle Dashboard
tests that relied on `mockImplementationOnce` queue ordering now route by URL
via `mockImplementation`, and the cadence test asserts "no other endpoints
were touched" instead of "no fetch was touched at all". Also added a
deliberate "no interval / re-fetch" comment per ComposioHQ#6.

ComposioHQ#7 — Promoted core's `makeEmptyGlobalConfig` to the public
`createDefaultGlobalConfig` (kept the internal alias for back-compat). Both
the CLI's `persistUpdateChannel` and `loadOrInit` (in `ao config`) now call
it instead of inlining the same defaults block. Single source of truth.

ComposioHQ#8 — New `packages/core/src/update-cache.ts` exports
`getUpdateCheckCachePath`, `readUpdateCheckCacheRaw`, and
`getInstalledAoVersion`. The CLI's `update-check.ts` keeps its richer
install-method/channel/git-rev validation but now delegates path resolution
and version lookup to core. The dashboard's `/api/version` route drops its
duplicated `getCachePath`/`readCache`/`getCurrentVersion` and consumes from
core directly. Cache layout is one file, not two.

ComposioHQ#9 — Removed the duplicate `export { isManualOnlyInstall }` from
`update.ts` (also dropped the unused import). The canonical export lives in
`update-check.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release-train): cosmetic — workflow rename note, design tokens, changeset trim

#1  release.yml: added a comment above `workflows: [CI]` warning that GitHub
    matches by name (not filename) and silently no-ops on mismatch — so a
    rename of ci.yml's `name:` field would mean releases stop triggering.

ComposioHQ#10 UpdateBanner: replaced text-[13px] / text-[12px] with text-sm / text-xs
    to match the dashboard's chrome scale.

ComposioHQ#6  Banner refresh: noted in the existing useEffect comment that we don't
    re-fetch — re-evaluate if "user kept tab open for days, missed an
    update" becomes a real complaint.

ComposioHQ#11 .changeset/release-train.md: dropped @aoagents/ao-web from the version
    bump list. The package is `private: true` and in changeset's ignore[],
    so listing it was cosmetic and would just clutter the eventual release
    notes with a non-published artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): illegalcall review — global guard, channel scoping, publishable web

(#1, PRRT_kwDORPZUAc6BHFDf) — `ensureNoActiveSessions` now ALWAYS loads
from the canonical global config, never from project-local. The previous
code preferred `loadConfig()` (local search-upward) when run inside a repo,
which made `sm.list()` enumerate only that project's sessions — active work
in other registered projects would be missed and the install would proceed.
New regression test asserts that a session in `other-project` blocks the
update even when invoked from `this-project`'s cwd. Existing global-config
tests retained.

(#2, PRRT_kwDORPZUAc6BHFOl) + (#4, PRRT_kwDORPZUAc6BHFon) — Reverted the
`private: true` on @aoagents/ao-web. Because @aoagents/ao-cli has a
workspace:* runtime dep on it (for `findWebDir()`/dashboard files), pnpm
rewrites the dep on publish to a literal version — keeping ao-web private
would make `npm install -g @aoagents/ao` fail. Restored ao-web to the
changeset linked group, removed it from `ignore[]`, restored the release-
train changeset entry, added publishConfig + repository metadata.

New `scripts/check-publishable-deps.mjs` walks every package and asserts
that no publishable package has a workspace:* runtime dep on a `private:
true` package. Wired into both release.yml and canary.yml before the
publish step so any future regression is caught at CI rather than at the
user's `npm install`. Verified the script catches the inverse condition.

(#3, PRRT_kwDORPZUAc6BHFaV) — `readCachedUpdateInfo` now treats a missing
`data.channel` as a miss when an explicit channel is provided. Previous
logic only rejected when both data.channel and channel were set AND
differed, so a legacy cache entry (pre-channel-scoping) could keep returning
stale stable state to a user who had since switched to nightly until the
24h TTL expired. Existing fixtures bumped to include `channel` where the
test exercises the checkForUpdate / maybeShowUpdateNotice path; new
regression test exercises the legacy-no-channel case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): SHA-suffix nightly compare — never miss a banner

(P1, PRRT_kwDORPZUAc6BJLrW) Git SHAs are uniformly-random hex, so the old
`comparePrereleaseSegments` lexical fallback gave the wrong answer ~50% of
the time on snapshot tags. Concretely: user installs `0.5.0-nightly-f00d123`,
CI publishes `0.5.0-nightly-0dead01`, and `'f' < '0'` returns false → banner
never shows.

Fix: when two prerelease segments are both non-numeric and differ, treat the
left side as older (return -1). The cache layer always carries the registry's
CURRENT dist-tag, so any non-numeric mismatch on the same base means the
installed copy is behind by construction. Numeric ordering (`rc.1 < rc.2`)
and numeric-vs-non-numeric (`0.5.0-1 < 0.5.0-alpha`) are unchanged.

Tradeoff: a user who manually installed `0.5.0-beta` while the registry only
publishes `0.5.0-alpha` would see a spurious banner. AO's release pipeline
only emits SHA-suffixed nightly prereleases, so the scenario doesn't occur in
practice — documented in the function's JSDoc.

Updated two misleadingly-named tests ("orders SHA-suffixed nightlies
lexically") that had been asserting the buggy behavior; new tests cover the
specific case from the review (`nightly-f00d123` vs `nightly-0dead01`) and
preserve the numeric-ordering invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(release-train): multi-project active-session proof + changeset note

(#1, PRRT_kwDORPZUAc6BHFDf follow-up) Dhruv asked for proof — not a comment —
that loadConfig(globalPath) actually enumerates across all registered
projects, not just the cwd's. New test in update.test.ts seeds proj-a and
proj-b in the global config, places one active session in each (one
"working", one "needs_input"), and asserts the refusal stderr lists BOTH
session ids AND the total count says "2 sessions active". The test is
specifically named so it shows up in `vitest run -t "Dhruv proof"`.

Verified `pnpm changeset version` locally — @aoagents/ao-web, ao-cli,
and ao all bump to 0.7.0 together via the linked group, confirming the
install-404 class of bug is gone.

Also updated the release-train changeset to drop the stale "moves the
private @aoagents/ao-web to ignore" line — that contradicts the current
state (ao-web is publishable and in the linked group).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): Ashish P1+P2 — dashboard banner now actually installs

P1 — Dashboard banner click was a no-op for npm users.
POST /api/update spawns `ao update` with `stdio: "ignore"`, which makes
`isTTY()` return false in the child. The old handleNpmUpdate hit the
non-TTY branch ("Run: ...") and exited without installing. Banner returned
202 "started"; nothing actually happened.

Fix (Ashish's option c, with an env-var bridge):
- /api/update spawns with `AO_NON_INTERACTIVE_INSTALL=1` on the env.
- handleNpmUpdate computes `interactive = isTTY() && !isApiInvoked()`.
- Restructured so the early-return only fires for non-TTY + non-API
  (piped output): we still print "Run: ..." for that case, matching the
  old contract. API-invoked path now actually runs runNpmInstall, skipping
  the confirm prompt (would hang the detached child forever).

Three new CLI tests:
- AO_NON_INTERACTIVE_INSTALL=1 → spawn invoked even with isTTY=false.
- The piped-output case (no env var, no TTY) still prints "Run: ...".
- Active-session guard still fires in the API-invoked path (defense in
  depth — the route's own guard isn't single point of trust).

P2 — First nightly opt-in stuck on "Already on latest stable".
Repro: user on stable 0.5.0, runs `ao config set updateChannel nightly`,
runs `ao update`. previousChannel was undefined, isOutdated was false
(semver: prerelease < stable on equal base), so the early return fired
and the install never ran.

Fix: new `isFirstChannelOptIn` branch — `previousChannel === undefined
&& info.currentVersion !== info.latestVersion && !info.isOutdated`. Force
the same prompt path the channel-switch case uses (default=no, explicit
consent). Confirmed install path covered by a new test that mirrors the
repro exactly.

The pre-existing "no previous cache → no prompt" test asserted the OLD
buggy behavior; rewritten to assert the canonical case (no prior cache
AND versions match → still "Already on latest", no prompt).

P2 — Dashboard /api/version legacy cache.
Same class as Dhruv #3, this time on the web side. Old code:
  const cacheMatchesChannel = !cache?.channel || cache.channel === channel;
A legacy entry without `channel` would short-circuit `!cache?.channel`
and serve stale latestVersion. Fixed to require `cache.channel === channel`
explicitly. New regression test seeds a no-channel entry and asserts
{ latest: null, isOutdated: false, checkedAt: null }.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release-train): Dhruv edge-case — running.json is the live source of truth

(PRRT_kwDORPZUAc6BUIUK) The active-session guard previously short-circuited
on empty global config, which missed the case where:

  - User runs `ao start` from a repo with a local agent-orchestrator.yaml
    and no global registration.
  - running.json lists that project as currently being polled.
  - Sessions live on disk under ~/.agent-orchestrator/{hash}-{projectId}/.

In that state, `loadGlobalConfig().projects` is empty so the old early
return fired and `ao update` would proceed while a daemon was actively
supervising the user's in-flight work.

Fix: consult `getRunning()` BEFORE falling back to the global registry.
When running.json reports projects, trust its configPath (could be a local
project yaml OR the canonical global path — `loadConfig` dispatches on
shape) and build the SessionManager from there. The global fallback is now
the no-daemon-running case, where on-disk sessions get reconciled by
SessionManager enrichment.

Three new tests in update.test.ts:
- `refuses when sessions exist in a locally-registered project not in
  global config (Dhruv edge-case)` — seeds running.json with a local-only
  project + working session, asserts refusal + that loadConfig was called
  with running.configPath (NOT the global path).
- `returns true (allows update) when running.json is gone and global is
  empty` — covers the genuinely-safe case.
- `trusts running.json over an inconsistent global config` — when both
  signals exist, the live one (running.json) wins and loadGlobalConfig is
  never consulted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release-train): shift canary cron 23:00 → 23:30 IST

Schedule moves to 23:30 IST = 18:00 UTC. Cron expression changes from
`30 17 * * 5,6,0,1,2` to `0 18 * * 5,6,0,1,2`. Same DOW window
(Fri,Sat,Sun,Mon,Tue) so the bake window (Wed–Thu) is unaffected.

Files touched (all consistent):
- .github/workflows/canary.yml — cron expression + comment block
- .changeset/release-train.md — schedule string in feature description
- CONTRIBUTING.md — "Testing your changes" callout

Verified `grep -rn '23:00 IST|17:30 UTC|"30 17'` returns zero matches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants