feat(ci): migrate npm publishing to OIDC Trusted Publishers#1592
feat(ci): migrate npm publishing to OIDC Trusted Publishers#1592alinarublea wants to merge 12 commits into
Conversation
Eliminates the long-lived ADOBE_BOT_NPM_TOKEN by switching to npm OIDC Trusted Publishers (npm >= 11.5.1). Adds a one-time setup script to configure all 23 packages on npmjs.com. - Upgrade npm CLI from 10.9.4 to 11 in workflow and setup-node-npm action - Remove NPM_TOKEN from Semantic Release and dry-run steps - Add scripts/setup-npm-trusted-publishers.sh for one-time npm trust setup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scope id-token:write to per-job permissions (test + release) instead of workflow-level; each job documents exactly what it needs - Add NPM_CONFIG_PROVENANCE=true to release job for sigstore attestations - Add workflow-rename warning comment (trusted publisher binds to filename) - Fix stale step name "Set up Node.js 22" -> "Set up Node.js" - Fix misleading "Update NPM (only if cache miss)" -> "Update NPM" in action - Remove @adobe/spacecat-shared-example from trusted publishers list (template package, not intended for automated publishing) - Add npm version preflight check (>= 11.10.0) to setup script - Add npm whoami preflight (must run as adobe-bot account) - Replace set -e abort with per-package failure tracking; FAILED array collects all errors and script exits non-zero with full list - Add note that npm trust is idempotent (safe to re-run after partial failure) - Add comment explaining --file takes filename only, not full path - Add note on new-package workflow in script header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix package count in workflow comment: 23 -> 22 (example excluded) - Add branch-protection note: trust binding security depends on it - Add npm registry preflight (must be registry.npmjs.org, not a mirror) - Add workflow file existence preflight (catches WORKFLOW constant drift) - Capture npm trust output explicitly with rc check; print indented for clarity instead of relying on shell short-circuit semantics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
…ured @semantic-release/npm 13.x tries OIDC exchange during verifyConditions, even in dry-run mode. Without trust bindings registered on npmjs.com (setup-npm-trusted-publishers.sh not yet run), the exchange returns 404 "package not found", then falls back to NPM_TOKEN — which has been removed. This breaks CI on all non-main branches. Add SR_NO_NPM_AUTH env var to the dry-run workflow step and guard the @semantic-release/npm plugin in all .releaserc.cjs files so dry-run skips npm entirely. Release job on main is unaffected (no SR_NO_NPM_AUTH). Once trusted publishers are configured and OIDC publishing confirmed working, this guard can be removed and the dry-run can exercise the full npm plugin path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
solaris007
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the thorough setup script and rollback plan. The migration approach is sound and the security improvement is real, but two issues block this PR from working as designed: a typo in the npm CLI subcommand will cause the setup script to fail outright, and the workflow's documented branch-protection trust ceiling does not exist on the repo today.
Strengths
- Workflow-level
permissions: {}with per-jobid-token: writescoping (.github/workflows/main.yaml:10,:18-22,:50-54) is the correct least-privilege posture. Previously workflow-levelid-token: writemade the token mintable from every job. persist-credentials: 'false'on both checkout steps (main.yaml:33,:62) preventsGITHUB_TOKENfrom being persisted into git config.- Workflow header comment (
main.yaml:3-9) puts the rename hazard and branch-protection dependency at the point of risk where editors will see them. - Setup script preflight is rich (
scripts/setup-npm-trusted-publishers.sh:30-67): npm version, registry URL,whoami == adobe-bot, and workflow-file existence all checked before any mutation. - Per-package failure tracking (
setup-npm-trusted-publishers.sh:96-117) withset -uo pipefail(no-e) plus theFAILEDarray is the right shape for an idempotent batch operation. NPM_CONFIG_PROVENANCE: 'true'(main.yaml:65) ties sigstore attestations to every publish - real supply-chain integrity gain for downstream consumers.NPM_TOKENfully removed from workflow env (verified across the diff: zero remaining references in.github/workflows/or composite actions).- Dry-run is correctly gated by
if: github.ref != 'refs/heads/main'ANDSR_NO_NPM_AUTH=true- removes the original concern that PR branches could trigger token-bearing dry-runs. - Rollback path is concrete and reversible: keep
ADOBE_BOT_NPM_TOKENfor 2 cycles, revert restores token-based publishing.
Issues
Critical (Must Fix)
1. scripts/setup-npm-trusted-publishers.sh:99 - wrong npm trust subcommand; script will fail at runtime.
The script invokes npm trust github-actions "${pkg}" --repository --file --yes. The actual npm CLI subcommand is npm trust github, not github-actions. Verified against npm/cli source at lib/commands/trust/index.js:
static subcommands = {
github: require('./github.js'),
gitlab: require('./gitlab.js'),
circleci: require('./circleci.js'),
list: require('./list.js'),
revoke: require('./revoke.js'),
}There is no github-actions alias. Running this script as-written will produce Unknown subcommand: github-actions for every package, populate FAILED with all 22, and exit non-zero before configuring any trust binding.
This is the entire purpose of the PR. The "prerequisites before merging" step in the PR body cannot be satisfied as written. If merged before someone notices, the first release after merge will fail with no way to recover except by reverting (NPM_TOKEN already gone from workflow).
Fix: change line 99 from npm trust github-actions "${pkg}" \ to npm trust github "${pkg}" \. Update the comment in the script header that references the same subcommand. Run the corrected script end-to-end against at least one package and capture the output before merging.
2. .github/workflows/main.yaml:7-9 - branch protection asserted but does not exist; trust ceiling is missing.
The workflow comment states Trust binding security relies on branch protection: do NOT weaken protection on main, or any contributor with push access could mint a publish-scoped npm token via OIDC. Verified empirically:
$ gh api repos/adobe/spacecat-shared/branches/main/protection
{"message":"Branch not protected","status":"404"}
$ gh api repos/adobe/spacecat-shared/rules/branches/main
[]There is no branch protection AND no rulesets on main. Combined with the structure of npm trust bindings (verified in npm/cli lib/commands/trust/github.js - bindings store { repository, workflow_ref.file } only, no branch or ref constraint), this means:
- Anyone with write access to
adobe/spacecat-sharedcan push directly tomainor merge a PR that modifiesmain.yamlto bypass theif: github.ref == 'refs/heads/main'gate. The OIDC token from any branch's run of this workflow will satisfy the trust binding's claims. - The workflow's
if: github.ref == 'refs/heads/main'is the only ref constraint, and a single-line PR can remove it.
Mitigations, in order of strength:
- (Strongest, code-fixable in this PR) Bind a GitHub Environment (e.g.
npm-publish) to the release job, configure its deployment branch policy to allow onlymain, and add--environment npm-publishto thenpm trust githubinvocations in the setup script. This makes the environment claim part of the trust binding so a forked workflow without it cannot mint publish tokens. - (Minimum, repo-admin) Configure branch protection or a ruleset on
main: disallow direct push, require PR review, disallow force-push. The workflow's stated assumption then actually holds.
This should land before any trust bindings are configured. Right now, even after the typo fix, the security model the PR describes is not the security model the repo enforces.
Important (Should Fix)
1. packages/spacecat-shared-example is excluded from trust bindings but still has the SR_NO_NPM_AUTH shim - latent break on next release.
Three signals disagree:
setup-npm-trusted-publishers.sh:69-72excludes@adobe/spacecat-shared-examplefrom PACKAGES with the comment "template package, not intended for automated publishing".packages/spacecat-shared-example/.releaserc.cjs:13was updated with the SR_NO_NPM_AUTH shim, so it loads@semantic-release/npmwhenever the env var is unset.packages/spacecat-shared-example/package.jsonis not marked"private": trueand currently carries a real version.
In the release job (where SR_NO_NPM_AUTH is NOT set), if conventional-commit history ever includes a feat: or fix: scoped to the example package, semantic-release will attempt to publish via OIDC, find no trust binding, and the release job will fail. Fix options (either is fine):
- Set
"private": trueinpackages/spacecat-shared-example/package.jsonand let@semantic-release/npmskip private packages. - Drop
.releaserc.cjsfrompackages/spacecat-shared-example/entirely (semantic-release-monorepo skips packages without a config).
Marking it private is the more idiomatic fix.
2. process.env.SR_NO_NPM_AUTH ? [] : [...] truthy check is a footgun across 24 files.
In JavaScript, the strings "false", "0", "no" are all truthy. If a future maintainer sets SR_NO_NPM_AUTH=false thinking it re-enables the npm plugin, npm publishing is still skipped. The workflow sets 'true' explicitly, so the current behavior is fine; the fragility is in the contract the 24 .releaserc.cjs files now embed.
Fix: change to process.env.SR_NO_NPM_AUTH === 'true' ? [] : [...] in all 24 files. One search-and-replace. Even better: a small node test that loads any one .releaserc.cjs under three env values (undefined, 'true', 'false') and asserts the plugin presence would lock the contract.
3. Dry-run never exercises the OIDC publish path - first main release is the first real test.
main.yaml:46-50 gates the dry-run on if: github.ref != 'refs/heads/main' AND sets SR_NO_NPM_AUTH=true, which removes @semantic-release/npm entirely. On the merge commit to main, only the actual release job runs. There is no PR build and no main build prior to the release step that validates OIDC publish works end-to-end. If trust bindings are misconfigured (wrong workflow filename, missing package, npm-side outage during setup), the failure mode is a broken release on main with revert as the only recovery.
The PR description acknowledges "publish --dry-run --provenance smoke-test on PRs" as deferred, but the cutover risk warrants at minimum: capture the output of running the corrected setup-npm-trusted-publishers.sh against all 22 packages (showing the npm trust list result for each) and attach to the PR or the linked SITES-42702 ticket. Better: a controlled canary publish on a release branch before the workflow change lands on main.
Minor (Nice to Have)
-
No shellcheck or actionlint on the new artifacts. A 130-line bash script and a workflow modification with no static analysis. Add a CI step
shellcheck scripts/*.shandactionlint .github/workflows/*.yaml- both run in seconds and catch quoting bugs, missing pinned versions, deprecated action references. -
Workflow rename has no CI enforcement. The header comment warns about renames, but a renamer 6 months from now will not necessarily read it. A small step in the release job that fails early if
.github/workflows/main.yamlis not at the expected path converts a silent publish failure into a loud CI failure. -
PACKAGES array drift. A new
packages/spacecat-shared-newthing/added without updating PACKAGES will fail OIDC silently on its first release. ~10 lines at the start of the setup script that diff PACKAGES againstfind packages -name package.json | xargs jq -r .name(excludingspacecat-shared-example) catches this before any mutation. -
SR_NO_NPM_AUTHlifecycle is not pinned. The shim is documented as removable "once OIDC publishing confirmed working" but there's no follow-up issue, no TODO marker, and no log line that would tell anyone whether the shim is still load-bearing. File a follow-up issue with a concrete trigger ("after 2 successful OIDC releases on main, remove SR_NO_NPM_AUTH from all 24.releaserc.cjsand from main.yaml line 50") and reference it from the workflow comment. -
Onboarding documentation for the post-rotation world is missing. The script comment instructs operators to do a "token-based first publish" for new packages, but after the documented rotation removes
ADOBE_BOT_NPM_TOKENfrom secrets, the token won't exist. Update the comment to reflect the actual workflow: an interactivenpm publishfrom a developer machine logged in as adobe-bot, OR pre-create the empty package on npmjs.com. -
Action SHA pinning deferred.
actions/checkout@v6,actions/setup-node@v6, andactions/cache@v5are floating tags. Acknowledged out-of-scope per PR description, but worth tracking explicitly: a tag-overwrite or compromised-action incident on any of these would inject code into the publish job (which hasid-token: writeandcontents: write). The OIDC migration improves credential security but not action supply-chain security. -
No alerting on release failures. If the first release fails for any reason (OIDC mint failure, sigstore down, registry 5xx), the only signal is a red workflow run on the default branch. A
if: failure()step in the release job that posts to a known Slack channel or opens a GitHub issue would convert an opt-in watcher dependency into an actionable signal. -
engines.npmfloor doesn't match the OIDC requirement. Rootpackage.jsonengines.npmpermits>=10.9.0 <12.0.0, the workflow installs11.13.0, the script requires>=11.10.0, OIDC requires>=11.5.1. Four different floors. Bumpengines.npmto>=11.5.1 <12.0.0sonpm installenforces what the publish path requires.
Recommendations
- Add
--environment npm-publishtonpm trust githubinvocations once the GitHub Environment is configured. Pairs with the branch-protection fix (Critical 2) and gives defense-in-depth that survives a momentary lapse in branch protection. The npm trust binding will require the environment claim on the OIDC token, which only main-policy environments can mint. - Add CODEOWNERS for
.github/workflows/main.yamlandscripts/setup-npm-trusted-publishers.sh. Once branch protection requires code-owner review, this constrains who can land workflow-modifying PRs that affect publish trust. - Add a 2-second sleep between
npm trustcalls in the loop. Per the npm v11 bulk usage docs, a 2-second delay is recommended to avoid rate limiting; without it, registering 22 packages in quick succession could hit registry throttling. - Track the
ADOBE_BOT_NPM_TOKENrotation explicitly. Open a follow-up issue with a date target ("rotate by 2026-MM-DD if 2 OIDC releases succeed") owned by a person; without a forcing function, retained-but-unused secrets tend to live indefinitely. - Document the sigstore runtime dependency in the on-call runbook.
NPM_CONFIG_PROVENANCE: 'true'makes sigstore reachability a release prerequisite. Pre-decide whether to disable provenance temporarily during a sigstore outage or to wait it out. - Add a node-based config-sync test for the 24
.releaserc.cjsfiles so future single-file edits cannot silently diverge from the rest. ~30 lines, prevents the most likely class of release-config drift. - Audit trail for the setup script. Have the script print a final summary line with
git rev-parse HEAD, operator's npm whoami, and timestamp, instructing the operator to paste it into the SITES-42702 ticket. Cheap, makes the trust-establishment ceremony auditable. - Verify provenance attestations after the first publish. Already in the PR test plan (
npm view @adobe/<pkg> dist.attestations); make it a checkbox item with a named reviewer rather than a generic "after merge" step.
Out of scope, worth tracking
- Replacing
ADOBE_BOT_GITHUB_TOKENPAT with a GitHub App installation token (acknowledged in PR description). - Restricting
on: [push]trigger to specific branches (acknowledged). - Periodic trust-drift detection cron on npmjs.com bindings (acknowledged).
- Auto-derive package list from workspace metadata (acknowledged).
- Pattern reuse across the ~10 other spacecat-* publishing repos belongs in a shared GitHub Action or
aem-sites-architectureskill rather than in each repo.
Assessment
Ready to merge? No - two Critical fixes required before this PR functions as designed.
Reasoning: The npm trust github-actions typo (Critical 1) is a definite functional bug; the script fails to register any trust binding, and the merge would leave publishing broken. The branch-protection assertion (Critical 2) is a security model gap: the workflow comments describe a control that does not exist on the repo, leaving the trust binding unconstrained by branch or environment until either branch protection is configured AND/OR the binding includes an --environment claim. Once those two are addressed, this is a clear net-positive for credential security. The Important and Minor items are real but tractable as follow-ups or quick fixes alongside the Critical changes.
Next Steps
- Fix the
npm trust github-actions->npm trust githubtypo, run the script end-to-end, and capture output as evidence. - Configure branch protection on
mainAND/OR add a GitHub Environment with main-only deployment policy +--environmentto the script. Re-run the setup script with the environment binding before merge. - Resolve the
spacecat-shared-exampleinconsistency (markprivate: trueis simplest). - Tighten the SR_NO_NPM_AUTH check to
=== 'true'across the 24 files. - Capture pre-merge evidence that all 22 packages have OIDC trust bindings registered against the correct workflow path/environment.
|
|
||
| for pkg in "${PACKAGES[@]}"; do | ||
| echo " Configuring: ${pkg}" | ||
| output=$(npm trust github-actions "${pkg}" \ |
There was a problem hiding this comment.
Critical: Wrong npm trust subcommand. The script will fail at runtime for every package.
Verified against npm/cli source (lib/commands/trust/index.js):
static subcommands = {
github: require('./github.js'),
gitlab: require('./gitlab.js'),
circleci: require('./circleci.js'),
list: require('./list.js'),
revoke: require('./revoke.js'),
}There is no github-actions alias. Running this as-written produces Unknown subcommand: github-actions for all 22 packages, populates FAILED, and exits non-zero before configuring any trust binding.
If merged with the typo, the first release after merge will fail because no bindings exist and NPM_TOKEN is already gone from the workflow.
Fix: change to npm trust github "${pkg}" \ (also update the script header comments referencing the same subcommand). Run end-to-end against at least one package and capture the output as evidence before merging.
| # to this file (adobe/spacecat-shared / .github/workflows/main.yaml). Renaming or moving | ||
| # this file breaks OIDC publishing — re-run scripts/setup-npm-trusted-publishers.sh after. | ||
| # | ||
| # Trust binding security relies on branch protection: do NOT weaken protection on `main`, |
There was a problem hiding this comment.
Critical: Branch protection is asserted in this comment but does not exist on the repo.
Verified empirically:
$ gh api repos/adobe/spacecat-shared/branches/main/protection
{"message":"Branch not protected","status":"404"}
$ gh api repos/adobe/spacecat-shared/rules/branches/main
[]No protection AND no rulesets on main. Combined with the structure of npm trust bindings (verified in npm/cli lib/commands/trust/github.js - bindings store { repository, workflow_ref.file } only, no branch or ref constraint), this means:
- Anyone with write access can push to
mainor merge a PR that modifiesmain.yamlto bypass theif: github.ref == 'refs/heads/main'gate. OIDC tokens from any branch satisfy the trust claims. - The
if:check is the only ref constraint, and a single-line PR can remove it.
Mitigations, in order of strength:
- (Strongest, code-fixable here) Bind a GitHub Environment (e.g.
npm-publish) to the release job withdeployment_branch_policy: main only, and add--environment npm-publishto thenpm trust githubcalls in the setup script. The OIDC token then carries theenvironmentclaim and only main-policy environments can mint it. - (Minimum, repo-admin) Configure branch protection or a ruleset on
main(disallow direct push, require PR review, disallow force-push). Then the workflow's stated assumption actually holds.
This should land before any trust bindings are configured. Right now, even after the typo fix, the security model this PR describes is not the security model the repo enforces.
| "changelogFile": "CHANGELOG.md", | ||
| }], | ||
| "@semantic-release/npm", | ||
| ...(process.env.SR_NO_NPM_AUTH ? [] : ["@semantic-release/npm"]), |
There was a problem hiding this comment.
Important: example package is excluded from trust bindings but still loads @semantic-release/npm - latent break on next release.
Three signals disagree:
scripts/setup-npm-trusted-publishers.shexcludes@adobe/spacecat-shared-examplefrom PACKAGES ("template package, not intended for automated publishing")- This file (
packages/spacecat-shared-example/.releaserc.cjs:13) was updated with the SR_NO_NPM_AUTH shim, so it loads@semantic-release/npmwhen the env var is unset (i.e. in the release job) packages/spacecat-shared-example/package.jsonis NOT marked"private": trueand currently carries a real version
In the release job (where SR_NO_NPM_AUTH is NOT set), if conventional-commit history ever includes a feat: or fix: scoped to this package, semantic-release will attempt to publish via OIDC, find no trust binding on npmjs.com, and the entire release job will fail.
Fix (either is fine, private is more idiomatic):
- Set
"private": trueinpackages/spacecat-shared-example/package.jsonand let@semantic-release/npmskip it - Drop
.releaserc.cjsfrom this directory entirely (semantic-release-monorepo skips packages without a config)
| "changelogFile": "CHANGELOG.md", | ||
| }], | ||
| "@semantic-release/npm", | ||
| ...(process.env.SR_NO_NPM_AUTH ? [] : ["@semantic-release/npm"]), |
There was a problem hiding this comment.
Important: SR_NO_NPM_AUTH truthy check is a footgun across all 24 .releaserc.cjs files.
In JavaScript, the strings "false", "0", "no" are all truthy. If a future maintainer sets SR_NO_NPM_AUTH=false thinking it re-enables the npm plugin, npm publishing is still skipped. The workflow sets 'true' explicitly so current behavior is fine, but the contract is fragile.
Fix: change to process.env.SR_NO_NPM_AUTH === 'true' ? [] : ["@semantic-release/npm"] in all 24 files. One search-and-replace.
Even better: a small node test that loads any one .releaserc.cjs under three env values (undefined, 'true', 'false') and asserts the plugin presence/absence. Locks the contract and detects any future inversion.
| env: | ||
| GITHUB_TOKEN: ${{ secrets.ADOBE_BOT_GITHUB_TOKEN }} | ||
| NPM_TOKEN: ${{ secrets.ADOBE_BOT_NPM_TOKEN }} | ||
| SR_NO_NPM_AUTH: 'true' |
There was a problem hiding this comment.
Important: Dry-run never exercises the OIDC publish path - first main release is the first real test.
This dry-run step sets SR_NO_NPM_AUTH: 'true', which removes @semantic-release/npm entirely from the plugin chain in all 24 .releaserc.cjs files. On the merge commit to main, only the actual release job runs. So no PR build and no main build prior to the release step validates OIDC publish works end-to-end.
If trust bindings are misconfigured (wrong workflow filename, missing package, npm-side outage during setup), the failure mode is a broken release on main with revert as the only recovery.
Mitigation in this PR (cheapest first):
- Capture the output of running the corrected
setup-npm-trusted-publishers.shagainst all 22 packages (andnpm trust listfor each) and attach to the PR or SITES-42702. Pre-merge evidence that bindings exist where they should. - (Better) A controlled canary publish on a release branch before the workflow change lands on main.
The PR description's deferred "publish --dry-run --provenance smoke-test on PRs" item would close this gap structurally; until then the pre-merge evidence is the minimum acceptable substitute.
- Fix `npm trust github-actions` typo → `npm trust github` (Critical: script would fail for all 22 packages) - Add GitHub Environment `npm-publish` to release job and `--environment` flag to trust script, replacing the non-existent branch protection as the security boundary - Mark `spacecat-shared-example/package.json` as `private: true` to prevent latent publish failure if a release commit touches the template package - Tighten `SR_NO_NPM_AUTH` check from truthy to `=== 'true'` across all 24 `.releaserc.cjs` files to avoid false-skip when set to `'false'` or `'0'` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bump root engines.npm to >=11.5.1 to match OIDC publishing requirement (workflow installs 11.13.0, setup script requires >=11.10.0) - Add PACKAGES drift check to setup-npm-trusted-publishers.sh — diffs PACKAGES array against publishable workspace packages so a newly added package surfaces in preflight instead of failing silently on first release - Sleep 2s between npm trust calls to avoid registry rate-limit throttling on bulk binding (per npm v11 bulk usage guidance) - Print audit-trail summary (git SHA, npm user, timestamp, env) on success to give the trust-establishment ceremony a paste-into-ticket artifact - Document SR_NO_NPM_AUTH removal trigger inline in the workflow so the shim does not silently outlive its purpose Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
solaris007
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Strengths
The round-1 fixes landed substantively with quality work:
npm trust githubsubcommand correctly applied (scripts/setup-npm-trusted-publishers.sh:147).SR_NO_NPM_AUTH === 'true'strict equality across all 24.releaserc.cjsfiles (verified consistent via grep across the tree).packages/spacecat-shared-example/package.jsoncorrectly marked"private": true.- PACKAGES drift detection via node-based workspace scan in the setup script.
- 2-second sleep between trust calls; per-package FAILED tracking with non-zero exit on any failure.
- Audit trail block at end of setup script (timestamp, git SHA, npm user/version, registry, repo, workflow, env, package count) is the right shape for SITES-42702.
engines.npm >=11.5.1 <12.0.0aligned with OIDC floor.- Workflow-level
permissions: {}with per-jobid-token: writescoping;persist-credentials: 'false'on both checkouts. NPM_CONFIG_PROVENANCE: 'true'on release job adds sigstore attestations on every publish.NPM_TOKENfully removed from workflow env (verified across diff).- SR_NO_NPM_AUTH lifecycle is documented inline in the workflow (
main.yaml:59-61). - Workflow rename warning comment is highly visible and points operators at the setup script.
- Prior Important issues (truthy check footgun, example package latent break) and Minor issues (PACKAGES drift, engines.npm floor, audit trail, pacing) all addressed cleanly.
- Pushed-back dry-run smoke test: counter holds, the setup-script audit trail is an acceptable substitute for a one-shot ceremony.
Issues
Critical (Must Fix)
-
The
npm-publishGitHub Environment does not exist server-side, andmainhas no branch protection. The trust-binding security model the PR describes is currently decorative, not enforced.Prior round's Critical 2 was marked addressed via GitHub Environment. The workflow correctly declares
environment: npm-publish(main.yaml:69) and the setup script correctly passes--environment npm-publishtonpm trust github. But the GitHub-side configuration was never done. Verified empirically:$ gh api repos/adobe/spacecat-shared/environments {"total_count":0,"environments":[]} $ gh api repos/adobe/spacecat-shared/branches/main/protection {"message":"Branch not protected","status":"404"} $ gh api repos/adobe/spacecat-shared/rulesets []When the release job first runs
environment: npm-publish, GitHub auto-creates the environment with no deployment_branch_policy and no required reviewers. The OIDC token will include theenvironment: npm-publishclaim regardless of which branch ran the job, because the claim is set by the workflow declaration, not validated by environment policy. The trust binding's--environmentfilter therefore enforces nothing beyond "the workflow declared this string."Attack chain: anyone with write access edits
main.yamlto remove theif: github.ref == 'refs/heads/main'guard, pushes to a feature branch. The auto-creatednpm-publishenvironment has no deployment_branch_policy, so the job runs. OIDC token mints with the rightrepo,workflow_ref, andenvironmentclaims. npm trust binding accepts. Publish succeeds with provenance attestation over the malicious build. End-to-end compromise from a single repo-write event.Required order of operations before merge:
- Create
npm-publishenvironment with a deployment_branch_policy restricting tomain, and configure at least one required reviewer. - Enable branch protection on
main: require PR review, dismiss stale approvals on push, disallow direct push, require status checks (thetestjob). - Verify both via the setup-script preflight check below.
- Re-run the setup script (npm trust is idempotent, so this is safe even if bindings were already registered).
- Create
Important (Should Fix)
-
Setup script must preflight the GitHub Environment policy and branch protection (
scripts/setup-npm-trusted-publishers.sh:36-72).Today's preflight verifies npm version, registry URL,
whoami == adobe-bot, and the workflow file's presence on disk - all things that would make the trust binding wrong. It does not verify the things that make the security model correct: thatnpm-publishexists on the repo with a main-only branch policy, and thatmainis branch-protected. Without these, the script will happily register 22 bindings whose security boundary doesn't exist (Critical 1 above).Add at the end of the preflight block:
ENV_JSON=$(gh api "repos/${REPO}/environments/${ENVIRONMENT}" 2>/dev/null) || { echo "ERROR: GitHub Environment '${ENVIRONMENT}' does not exist on ${REPO}." echo " Create it with a main-only deployment_branch_policy before running this script." exit 1 } gh api "repos/${REPO}/branches/main/protection" >/dev/null 2>&1 || { echo "ERROR: branch protection on 'main' is not configured." echo " Trust binding has no meaningful guard without it." exit 1 }
-
PACKAGES drift check silently passes if node fails (
scripts/setup-npm-trusted-publishers.sh:106-128).WORKSPACE_PACKAGES=$(node -e '...' 2>/dev/null)captures node's stdout but swallows stderr. Combined withset -uo pipefail(no-e) andpath.join(process.cwd(), "packages")instead ofprocess.env.REPO_ROOT, any failure inside node (wrong cwd, syntax error, malformedpackage.json) yields an emptyWORKSPACE_PACKAGES, an emptyDRIFTarray, and a silent pass that proceeds straight to trust registration. The drift check becomes a no-op precisely when the workspace state is unreadable - exactly when its safety net is needed.Fix: pass
REPO_ROOTexplicitly, capture exit code, and fail closed on empty output:if ! WORKSPACE_PACKAGES=$(REPO_ROOT="${REPO_ROOT}" node -e ' const fs = require("fs"); const path = require("path"); const dir = path.join(process.env.REPO_ROOT, "packages"); ...'); then echo "ERROR: drift check failed to enumerate workspace packages"; exit 1 fi [ -z "$WORKSPACE_PACKAGES" ] && { echo "ERROR: workspace package list empty"; exit 1; }
-
SR_NO_NPM_AUTH cleanup has no forcing function (
main.yaml:59-62+ 24.releaserc.cjsfiles).The "remove after 2 successful OIDC releases" instruction lives only in the workflow header comment and (implicitly) on the linked Jira ticket. Lifecycle markers in comments reliably outlive memory. After the trigger date the shim becomes dead supply-chain-adjacent code: 24 release configs coupled to an env var that should no longer exist, plus a one-line escape hatch that re-enables token-based publishing if
ADOBE_BOT_NPM_TOKENis ever re-added.Cheapest fix: drop a marker file (e.g.
.npm-oidc-bootstrap-completecontaining the date of the second successful release) and add a 5-line CI step that fails if the marker exists AND any.releaserc.cjsstill containsSR_NO_NPM_AUTH. Alternative: file a follow-up issue with a concrete removal date assigned to a person, referenced from the workflow comment. -
Release job lacks
timeout-minutesandconcurrencygroup (main.yaml:64).With OIDC token exchange, sigstore signing, and
npm publishnow in the critical path, the number of upstream hang points grew but the GitHub Actions default per-job timeout (6 hours) is unchanged. A hung release ties up a runner and pushes triage well past the failure. Separately, if two PRs land on main faster than the first release completes, two release jobs run in parallel - two simultaneous OIDC mints, two sigstore submissions, and semantic-release's tag-then-publish flow assumes serialized execution on main.Fix:
release: timeout-minutes: 15 concurrency: group: npm-publish-main cancel-in-progress: false
-
No CI lint enforces the SR_NO_NPM_AUTH guard across the 24
.releaserc.cjsfiles (workspace-wide).All 24 files are consistent today; consistency is preserved only by reviewer attention. A new package added by copy-pasting an older
.releaserc.cjsthat omits the guard, or a manual edit removing it from one file, would not be caught. WhileSR_NO_NPM_AUTH=trueis set in CI the regression is loud (dry-run fails). After the shim is removed (per the stated lifecycle) a missing guard becomes a silent no-op - the regression vector flips. Ten-line CI step removes the entire class of single-file drift:missing=$(grep -L "SR_NO_NPM_AUTH === 'true'" packages/*/.releaserc.cjs .releaserc.cjs) if [ -n "$missing" ]; then echo "FAIL: SR_NO_NPM_AUTH guard missing in: $missing"; exit 1; fi
This pairs with the SR_NO_NPM_AUTH tripwire from finding 4 - both guard the same future regression.
Minor (Nice to Have)
-
Drift check is one-directional (
scripts/setup-npm-trusted-publishers.sh:106-128). PACKAGES -> workspace coverage is verified (catches new package missing from PACKAGES) but not the inverse (a name in PACKAGES that no longer exists in the workspace - deleted, renamed, or flipped toprivate: true). Result over time: orphanednpm trustbindings on npmjs.com. Same node enumeration, no extra deps. -
Audit trail is stdout-only (
scripts/setup-npm-trusted-publishers.sh:175-192). Operators are asked to paste the trail into SITES-42702, but if the script is piped or redirected the trail is gone. One-liner fix: also write to a sibling file (npm-trust-audit-${TIMESTAMP}.log). -
Sigstore is now a release prerequisite, no break-glass documented (
main.yaml:76NPM_CONFIG_PROVENANCE: 'true'). status.sigstore.dev has had multi-hour outages. Add an inline comment naming the override (NPM_CONFIG_PROVENANCE: 'false'temporarily, loses attestation for that release, restore after sigstore recovers), or link to a runbook. -
Rollback path lives in PR description, not in repo. Post-merge an on-call responding to a release failure needs both failure-mode triage (sigstore vs OIDC mint vs trust binding vs registry 5xx) and the revert + secret-retention steps. Move to a short
RELEASE-RUNBOOK.mdreferenced from the workflow header. -
Workflow header warns about file rename; environment rename hazard is missing (
main.yaml:3-9). Renaming thenpm-publishGitHub Environment has the same silent-break failure mode: trust bindings include the environment claim, the new environment name will not be claimed by the OIDC token. Add one line to the existing comment. -
Stale "npm OIDC dry-run auth" comment on test job permissions (
main.yaml:26).SR_NO_NPM_AUTH: 'true'removes@semantic-release/npmfrom the plugin chain in dry-run, so the justification no longer holds.id-token: writeremains correct for ECR OIDC; trim the comment to avoid implying the test job participates in publish auth.
Recommendations
- Shared
.releaserc.base.cjsat workspace root: 24.releaserc.cjsfiles now differ only by their containing directory. Every future change (shim removal, plugin reorder, semantic-release upgrade) requires touching 24 files in lockstep. Amodule.exports = require('../../.releaserc.base.cjs')per package collapses this to one place and makes the eventual SR_NO_NPM_AUTH removal a one-line edit. - Encapsulate the OIDC pattern before it propagates: ~10 sibling spacecat-* / mysticat-* / helix-* repos publish to npm and will need this migration. A composite action (
adobe/spacecat-shared/.github/actions/npm-publish-oidc) or anaem-sites-architectureskill carries the workflow snippet, setup-script template, and prerequisites cleanly. Per workspace docs, the skill is the higher-leverage carrier for org-level standardization. - Periodic trust-binding audit cron: weekly job that runs
npm trust listper package and diffs against expected{ repo, workflow, environment }. Detects drift on the registry side, not just in the repo. Pairs with the existing PACKAGES check. - Failure alerting on the release job: today the only detection signal is "consumer service install fails downstream", which is slow. Minimum bar:
if: failure()step that posts to a Slack channel or opens a GitHub issue. Pairs with the timeout/concurrency findings. - Add a
--dry-runmode to the setup script: preflight + drift check + audit trail withoutnpm trustcalls. Useful for CI smoke-checking the script itself and for operators verifying local state before the real run. - Sigstore consumer verification policy: attestations are emitted but no Adobe-side consumer enforces them. Define what services importing
@adobe/spacecat-shared-*should do if attestation is missing/invalid.
Out of scope, worth tracking
- Deferred
npm publish --dry-run --provenancesmoke test on PRs (pushed-back from prior round; audit trail substitutes). - Action SHA pinning (
actions/checkout@v6,actions/setup-node@v6,actions/cache@v5) - acknowledged in PR description. on: [push]trigger narrowing to specific branches - acknowledged.ADOBE_BOT_GITHUB_TOKENPAT to GitHub App installation token - acknowledged.- ADOBE_BOT_NPM_TOKEN explicit rotation deadline: PR mentions "after 2 successful release cycles" without an owner or date target. Pin both in SITES-42702.
- shellcheck/actionlint on the new shell + workflow artifacts.
- Workflow rename CI enforcement: header-comment-only today.
- CODEOWNERS on
.github/workflows/main.yamlandscripts/setup-npm-trusted-publishers.sh.
Assessment
Ready to merge? No, with fixes.
Reasoning: One Critical finding blocks merge. The advertised security model relies on the npm-publish GitHub Environment having a main-only deployment branch policy AND main being branch-protected. Neither is configured today. The workflow + setup-script changes are correct as code, but they describe a security boundary that does not exist on the repo. Once the environment is created with the right policy, branch protection is enabled, and the setup script's preflight gates against both, this becomes a strong supply-chain improvement. The Important findings are all small, mechanical, and worth landing alongside the Critical fix to remove the remaining gaps between the asserted and enforceable security model.
Next Steps
- Create the
npm-publishGitHub Environment with a main-only deployment branch policy and required reviewers. - Enable branch protection on
main(PR review, dismiss stale approvals, disallow direct push, requireteststatus check). - Add the environment + branch-protection preflight to the setup script.
- Tighten the drift check to capture node exit code and fail closed on empty output.
- Add
timeout-minutes: 15and aconcurrencygroup to the release job. - Add the SR_NO_NPM_AUTH lifecycle tripwire (marker file + CI step), or file a follow-up issue with a concrete date and owner.
- Add the CI lint for the 24-file SR_NO_NPM_AUTH guard.
- Re-run the setup script after items 1-3 land (npm trust is idempotent).
- Minor items are optional but cheap; bundle into the same fix commit if pursuing.
| runs-on: ubuntu-latest | ||
| needs: test | ||
| if: github.ref == 'refs/heads/main' | ||
| environment: npm-publish |
There was a problem hiding this comment.
Critical: The npm-publish GitHub Environment does not exist server-side, and main has no branch protection. The trust-binding security model this PR describes is currently decorative, not enforced.
Verified empirically:
$ gh api repos/adobe/spacecat-shared/environments
{"total_count":0,"environments":[]}
$ gh api repos/adobe/spacecat-shared/branches/main/protection
{"message":"Branch not protected","status":"404"}
When this job first runs environment: npm-publish, GitHub auto-creates the environment with no deployment_branch_policy. The OIDC token will carry the environment: npm-publish claim regardless of branch (the claim is set by the workflow declaration, not validated by env policy). The trust binding --environment filter therefore enforces nothing beyond "the workflow declared this string." Combined with no branch protection on main, write access to the repo equals direct push to main equals OIDC publish capability.
Required order of operations before merge: create the npm-publish environment with a main-only deployment_branch_policy + required reviewer, enable branch protection on main, add the preflight check to the setup script, then re-run the setup script (npm trust is idempotent). See main review body for the full attack chain and required steps.
| if [ ! -f "${REPO_ROOT}/.github/workflows/${WORKFLOW}" ]; then | ||
| echo "ERROR: workflow file not found: .github/workflows/${WORKFLOW}" | ||
| echo " Update the WORKFLOW constant in this script if main.yaml was renamed." | ||
| exit 1 |
There was a problem hiding this comment.
Important: missing GitHub Environment + branch-protection preflight.
Today's preflight verifies npm version, registry URL, whoami == adobe-bot, and the workflow file's presence on disk. It does not verify the things that make the security model correct: that npm-publish exists on the repo with a main-only branch policy, and that main is branch-protected. Without these, the script registers 22 bindings whose security boundary does not exist (Critical 1).
Add at the end of the preflight block:
ENV_JSON=$(gh api "repos/${REPO}/environments/${ENVIRONMENT}" 2>/dev/null) || {
echo "ERROR: GitHub Environment '${ENVIRONMENT}' does not exist on ${REPO}."
echo " Create it with a main-only deployment_branch_policy before running this script."
exit 1
}
gh api "repos/${REPO}/branches/main/protection" >/dev/null 2>&1 || {
echo "ERROR: branch protection on 'main' is not configured."
exit 1
}| WORKSPACE_PACKAGES=$(node -e ' | ||
| const fs = require("fs"); | ||
| const path = require("path"); | ||
| const dir = path.join(process.cwd(), "packages"); |
There was a problem hiding this comment.
Important: PACKAGES drift check silently passes if node fails.
WORKSPACE_PACKAGES=$(node -e '...' 2>/dev/null) captures stdout but swallows stderr. Combined with set -uo pipefail (no -e) and path.join(process.cwd(), "packages") instead of process.env.REPO_ROOT, any failure inside node (wrong cwd, syntax error, malformed package.json) yields an empty WORKSPACE_PACKAGES, an empty DRIFT, and a silent pass that proceeds straight to trust registration. The drift check becomes a no-op precisely when the workspace state is unreadable - exactly when the safety net is needed.
Fix: pass REPO_ROOT explicitly, capture exit code, and fail closed on empty output:
if ! WORKSPACE_PACKAGES=$(REPO_ROOT="${REPO_ROOT}" node -e '
const dir = path.join(process.env.REPO_ROOT, "packages");
...'); then
echo "ERROR: drift check failed to enumerate workspace packages"; exit 1
fi
[ -z "$WORKSPACE_PACKAGES" ] && { echo "ERROR: workspace package list empty"; exit 1; }| # SR_NO_NPM_AUTH skips @semantic-release/npm in the plugin chain (see all .releaserc.cjs). | ||
| # Remove this env var AND the matching guard from every .releaserc.cjs after 2 successful | ||
| # OIDC releases on main confirm publishing works without a token (tracked in SITES-42702). | ||
| SR_NO_NPM_AUTH: 'true' |
There was a problem hiding this comment.
Important: SR_NO_NPM_AUTH cleanup has no forcing function.
The "remove after 2 successful OIDC releases" instruction lives only in this comment and on SITES-42702. Lifecycle markers in comments reliably outlive memory. After the trigger date the shim becomes dead supply-chain-adjacent code: 24 release configs coupled to an env var that should no longer exist, plus a one-line escape hatch that re-enables token-based publishing if ADOBE_BOT_NPM_TOKEN is ever re-added.
Cheapest fix: drop a marker file (e.g. .npm-oidc-bootstrap-complete containing the date of the second successful OIDC release) and add a 5-line CI step that fails if the marker exists AND any .releaserc.cjs still contains SR_NO_NPM_AUTH. Alternative: file a follow-up issue with a concrete removal date assigned to a person, referenced from this comment.
| # OIDC releases on main confirm publishing works without a token (tracked in SITES-42702). | ||
| SR_NO_NPM_AUTH: 'true' | ||
|
|
||
| release: |
There was a problem hiding this comment.
Important: release job lacks timeout-minutes and concurrency group.
With OIDC token exchange, sigstore signing, and npm publish now in the critical path, the number of upstream hang points grew but the GitHub Actions default per-job timeout (6 hours) is unchanged. A hung release ties up a runner and pushes triage well past the failure. Separately, if two PRs land on main faster than the first release completes, two release jobs run in parallel - two simultaneous OIDC mints, two sigstore submissions, and semantic-release's tag-then-publish flow assumes serialized execution on main.
Fix:
release:
timeout-minutes: 15
concurrency:
group: npm-publish-main
cancel-in-progress: falsemain.yaml: - Add timeout-minutes: 15 + concurrency group (npm-publish-main, no cancel) to release job so semantic-release tag-then-publish cannot race on parallel merges and a hung release cannot tie up a runner. - Add SR_NO_NPM_AUTH guard consistency check as a test-job step. Fails CI if any .releaserc.cjs is missing the === 'true' guard. Also includes a cleanup tripwire: if .npm-oidc-bootstrap-complete exists, fails CI until the shim is fully removed from all .releaserc.cjs and the workflow. - Trim stale 'npm OIDC dry-run auth' justification from test job id-token: write — SR_NO_NPM_AUTH=true removes @semantic-release/npm from the dry-run plugin chain entirely. - Add environment-rename hazard to workflow header (same silent-break mode as workflow-file rename). - Document sigstore break-glass next to NPM_CONFIG_PROVENANCE. setup-npm-trusted-publishers.sh: - Add GitHub Environment + branch-protection preflight before any npm trust call. Verifies env exists, has main-only deployment_branch_policy, and main has branch protection. Without these the trust binding's '--environment' filter is decorative. - Make drift check fail-closed: capture node exit code via if-let, pass REPO_ROOT explicitly (no cwd dependency), and abort on empty workspace enumeration. Previously a node failure silently passed. - Add reverse drift detection — entries in PACKAGES no longer in the workspace surface as a warning so orphaned trust bindings on npmjs.com are visible (non-blocking; they don't break the current run). - Write audit trail to a sibling log file (scripts/npm-trust-audit- TIMESTAMP.log) in addition to stdout, so a piped/redirected run still leaves an artifact on disk for SITES-42702. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main.yaml — Redesign SR_NO_NPM_AUTH guard step to address: - Self-referential tripwire (HIGH): the previous marker-file branch grepped for 'SR_NO_NPM_AUTH' in main.yaml, which always matched the step's own bash, trapping cleanup PRs in unkillable red CI. - Fail-open glob (HIGH): bash without nullglob passed literal packages/*/.releaserc.cjs to grep, which suppressed stderr and produced empty output — silently passing when files moved or renamed. - Undocumented marker contract (MEDIUM): .npm-oidc-bootstrap-complete had no creation procedure documented anywhere. New design auto-detects phase via the YAML mapping pattern of the env-var declaration (regex anchored to indented YAML key, syntactically distinct from the JS strict-equality '=== true' the step checks for in configs). No marker file. Phase 1 (shim active) requires all configs to have the strict guard. Phase 2 (shim removed) requires no config to reference SR_NO_NPM_AUTH — partial-cleanup detection comes for free. Adds shopt -s nullglob + non-empty files-array assert to close fail-open. setup-npm-trusted-publishers.sh: - Branch-protection preflight now PARSES the policy with gh api --jq and asserts: required_approving_review_count >= 1, enforce_admins.enabled, no force-push, no deletion, 'Test' in required_status_checks.contexts. Closes a bypass where a repo admin could weaken protection to the minimum config that still returns 200 while leaving main effectively unprotected. - Audit trail no longer silently swallows write failures (set -uo pipefail without -e meant tee errors continued). Builds content in a variable, prints to stdout unconditionally, attempts file write as a non-blocking artifact, surfaces a WARNING if the file is empty/unwritten. - Audit filename gets $$ (PID) suffix to avoid same-second collision. Verified locally: - bash -n / yaml.safe_load: syntax OK - Phase-detection regex: matches only the env-var line (exactly 1 match in main.yaml at line 112); does not match the bash code in the same step - Phase 1 on real repo: passes (24 configs have strict guard) - Synthetic drift: correctly flags missing guard - Phase 2 full cleanup (in-memory sim): correctly detected - Phase 2 partial cleanup (in-memory sim): correctly flags leftover refs - nullglob regression: confirmed OLD code fail-open, NEW code closed - Branch-protection preflight against live repo state: 'OK' - Branch-protection preflight against 4 synthetic weak inputs: each flagged correctly (zero approvals / admins exempt / force-push allowed / no Test in contexts) - Audit trail success + failure paths Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
solaris007
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Round-3 changes resolve 11 of 12 round-2 findings cleanly and introduce some genuinely good design: the SR_NO_NPM_AUTH phase auto-detection (no marker file), the five-attribute branch protection jq check, and the decoupled audit-log write are all better than what they replace. Verified server-side: npm-publish environment exists with main-only deployment branch policy plus required-reviewer (spacecat-admins team); main branch protection enforces PR review, dismiss-stale, enforce-admins, no force-push, no deletion, and "Test" status check. The migration approach is sound and the security improvement is real.
Three Important items before merge: one is a small jq addition that closes a real attack chain in the protection-state check, one is moving the rollback procedures from PR description into the repo for 3am-page accessibility, and one is committing to extract this pattern before it propagates to the ~10 sibling repos that need the same migration.
Strengths
Prior round-2 findings now addressed:
- Critical 1 (env + branch protection don't exist server-side): both are configured. Environment created
2026-05-12T14:20:43Zwith main-onlydeployment_branch_policy; branch protection passes the script's own 5-attribute check. - Important 2 (script preflight for env + branch protection): added in commit 76842cb, hardened in ac8975f with detailed jq attribute checks and current-state-on-failure dump. The "what is actually configured" output on failure is the right operator affordance.
- Important 3 (drift check silent fail): fail-closed via explicit
REPO_ROOT,if !capture of node exit code, and empty-output rejection. - Important 4 (SR_NO_NPM_AUTH cleanup forcing function): auto-detection from the workflow file itself. Cleaner than the marker file. Phase-1 regex
^[[:space:]]+SR_NO_NPM_AUTH:[[:space:]]+'true'and the JS guard patternSR_NO_NPM_AUTH === 'true'are syntactically distinct, so the step cannot self-reference. - Important 5 (release job timeout + concurrency):
timeout-minutes: 15plusconcurrency: { group: npm-publish-main, cancel-in-progress: false }. The "let in-flight finish" comment captures the rationale exactly - cancellation here would be the dangerous setting given semantic-release's tag-then-publish flow. - Important 6 (CI lint for SR_NO_NPM_AUTH guard across 24 files): the same tripwire step.
- Minor 7 (drift one-directional): reverse drift detection added with explicit
npm trust revokeremediation line in the warning. - Minor 8 (audit trail stdout-only): sibling log file with PID suffix for collision safety; decoupled file write so a failed write does not lose the audit content from stdout.
- Minor 9 (sigstore break-glass): inline comment naming the
NPM_CONFIG_PROVENANCE: 'false'override. - Minor 11 (env rename hazard): header comment now covers both file rename and environment rename.
- Minor 12 (stale dry-run auth comment): trimmed.
Beyond addressed findings:
- Workflow-level
permissions: {}with per-job scoping is correct least-privilege.id-token: writeonly where needed. persist-credentials: 'false'on both checkout steps preventsGITHUB_TOKENpersisting into git config.NPM_CONFIG_PROVENANCE: 'true'ties sigstore attestations to every publish - real supply-chain integrity gain.- The setup script is well-engineered fail-closed code with idempotent re-run semantics.
Issues
Important (Should Fix)
1. Protection-state check misses dismiss_stale_reviews and bypass_pull_request_allowances (scripts/setup-npm-trusted-publishers.sh jq block - also posted inline).
Verified attack chain without dismiss_stale_reviews: true:
- Author opens PR with benign change, gets approval.
- Author force-pushes the PR branch (force-push to PR branches is allowed by default; only target-branch protection blocks it).
- PR retains the stale approval, satisfying the
required_approving_review_countcheck. - PR merges to main.
- Release job publishes attacker code via OIDC with a valid sigstore attestation.
dismiss_stale_reviews is currently true on main, but the script's preflight does not assert it - if a future admin disables it, the script still prints "preflight OK" and the trust boundary the binding depends on degrades silently.
bypass_pull_request_allowances.users currently lists adobe-bot (required for semantic-release commits - acknowledged as residual elsewhere). If anyone is added to that list later, the OIDC publish path becomes reachable from a non-PR push by the new actor. The script should at minimum detect unexpected additions.
Suggested additions to the jq chain, sibling to enforce_admins:
elif (.required_pull_request_reviews.dismiss_stale_reviews // false) != true then "weak: dismiss_stale_reviews disabled"
elif ((.required_pull_request_reviews.bypass_pull_request_allowances.users // []) +
(.required_pull_request_reviews.bypass_pull_request_allowances.teams // []) +
(.required_pull_request_reviews.bypass_pull_request_allowances.apps // []) | length) > 1 then "weak: unexpected bypass actors"
One-line concept, two-line fix. Worth landing in this PR.
2. Rollback procedures should live in the repo, not the PR description (new file: docs/RELEASE-RUNBOOK.md or RUNBOOK.md - also posted inline at the workflow header).
The PR description currently holds the only documented procedures for:
- First-release OIDC failure -> revert PR, NPM_TOKEN resumes
- 2-release dual-write window before token rotation
- Trust-binding break-glass (rename of
main.yamlornpm-publishenvironment) - Sigstore break-glass (flip
NPM_CONFIG_PROVENANCE, with the realistic timing: open PR -> get review -> Test passes -> merge -> re-trigger release, which is roughly 30-60 minutes minimum givenenforce_admins.enabled = true)
PR descriptions are hidden, edited, lost in archive search, and require GitHub auth to retrieve. For a 3am page, an in-repo runbook is greppable from the checkout the on-call already has open. The work is small (lift from the PR description, add the sigstore-timing reality, add a partial-publish recovery note for the case where the 15-minute timeout kills a release mid-way through 22 packages and leaves a git tag for an incomplete release). This was Minor in round 2 and remained unresolved; calling it out again as Important from the operational lens.
3. Extract before propagating to the sibling repos (PR description mentions this pattern will need to land in ~10 other repos).
The current code shape is monolithic per-repo: a ~320-line setup script with hardcoded REPO/WORKFLOW/PACKAGES/protection-schema plus a ~50-line workflow guard step with the same regex contract. Drift across 10 copies is supply-chain risk, not test-suite tedium - any of those copies regressing on the protection-state check or the phase-detection regex is a silent security weakening, not a build failure.
Either (a) extract the setup script into a parameterised tool (@adobe/oidc-trusted-publishers-setup or a workspace skill) and the workflow guard into a composite action (adobe-actions/verify-sr-no-npm-auth-guard@v1) before this lands in the second repo, or (b) commit to extraction explicitly with a tracking issue and a hard cap (e.g. "extract by repo 3"). Either is fine; the current shape with no plan to extract is what to avoid. Option (b) is acceptable for closing this finding - file the tracking issue and link it from the PR.
Minor (Nice to Have)
Phase-1 grep masks "file not found" exit code (.github/workflows/main.yaml SR_NO_NPM_AUTH check). missing=$(grep -L "..." "${files[@]}" || true) swallows exit code 2 (grep file-not-found) the same way it swallows exit 1 (all matched). If root .releaserc.cjs is deleted, grep writes No such file or directory to stderr but the tripwire reports OK. The tripwire's stated job is consistency assertion; a missing input file should fail loud. Either guard with [ -f "$f" ] per path, or capture grep's actual exit code separately.
Phase-detection regex brittle to YAML quoting variants (same step). SR_NO_NPM_AUTH: "true" (double quotes) and SR_NO_NPM_AUTH: true (unquoted YAML boolean) both produce the same env value at runtime but neither matches ^[[:space:]]+SR_NO_NPM_AUTH:[[:space:]]+'true'. Result: a Prettier/yamllint reformat that flips quote style trips the workflow into phase 2 and fails on "leftover references". Fails closed (CI fails), so not a security issue, but a confusing maintenance signal. Broaden the regex to accept any of 'true'|"true"|true, or pin quote style in the inline comment as documented contract.
JS guard pattern brittle to JS quote-style (same step). The phase-1 search for SR_NO_NPM_AUTH === 'true' would miss === "true" if a Prettier config flips JS string quotes. Tighten to === ['"]true['"], or pin quote style in lint config.
Audit log file write [ -s ] check passes on truncated writes (scripts/setup-npm-trusted-publishers.sh audit block). Disk-full mid-write leaves a partial file; [ -s ] returns true; the "Audit log also written to" message lies. Atomic mktemp + mv is safer:
TMPFILE=$(mktemp -t npm-trust-audit.XXXXXX) && \
printf '%s\n' "${AUDIT_CONTENT}" > "$TMPFILE" && \
mv "$TMPFILE" "${AUDIT_FILE}"Protection-state check reports only the first weakness. If main is missing approval requirement AND admin exemption AND Test status check, the operator fixes one, re-runs, sees the next, etc. UX cost only, no security change. The aggregator pattern would report all weaknesses at once.
npm version preflight fails open on non-numeric prefix (scripts/setup-npm-trusted-publishers.sh early preflight). CURRENT_NPM=v11.13.0 (some npm --version variants emit the v prefix) -> cut -d. -f1 yields v11 -> [ "$CURRENT_MAJOR" -lt 11 ] errors with "integer expression expected" -> conditional reads as false -> preflight passes. Add a regex sanity check on CURRENT_NPM before the integer comparison.
gh auth status not preflight-checked. The script verifies command -v gh (installed) but not auth state. A user with gh installed but not logged in hits cryptic gh api errors during environment + protection checks. One-line addition.
Audit log files in scripts/ are not gitignored. The audit log writes to ${SCRIPT_DIR}/npm-trust-audit-${TIMESTAMP}-$$.log, inside the repo. An operator who runs the script then does git add scripts/ could accidentally commit one. Add scripts/npm-trust-audit-*.log to .gitignore, or write to ${TMPDIR:-/tmp}/ instead.
Recommendations
- GH App token migration before propagation: the
adobe-botbypass on main branch protection is a residual hole in the post-OIDC threat model. Pre-OIDC: compromise NPM_TOKEN -> publish malicious code directly. Post-OIDC: compromise adobe-bot's GitHub token -> push to main bypassing PR review -> next release publishes via the trust binding with a valid sigstore attestation that says "this was built from main" (true but compromised). Not introduced by this PR; the PR materially reduces overall risk by removing the NPM_TOKEN path. Track as a separate hardening (options: PR-based version bumps with auto-merge, split adobe-bot into separate npm and GitHub identities, push-time signatures). Sequence ahead of the 10 sibling repos so they inherit the cleaner trust shape. - Periodic protection-drift detection cron: weekly job that runs the same 5-attribute jq check (now expanded per Important 1) against main's branch protection. Catches an admin tweaking protection rules and forgetting to revert - a real failure mode for long-lived security baselines.
--preflight-onlyflag on the setup script: the preflight portion is already complete enough to stand alone. A dry-run flag enables periodic cron re-verification and local iteration when modifying the script. Pairs with the cron above.- Extract the CI tripwire into a fixture-tested script (
scripts/verify-sr-no-npm-auth-guard.shwithtests/fixtures/covering phase-1-clean, phase-1-missing-guard, phase-2-clean, phase-2-leftover-guard). The tripwire is now the permanent regression guard for the shim cleanup; making it self-tested future-proofs it against the regex brittleness flagged in the Minor section. - Lift the protection-schema contract into the extracted tool, not into each repo's copy. When the org's branch-protection model evolves (CODEOWNERS, signed commits, ruleset migration), one update propagates instead of N.
Pushed-back from prior rounds (resolved)
- Dry-run never exercises OIDC publish path (round-1 Important 3): pushback held in round 2, audit-trail substitutes; not re-raising.
Assessment
Ready to merge? No, with fixes.
Reasoning: This is a clear net-positive supply-chain change. It eliminates the long-lived NPM_TOKEN compromise vector, narrows the publish path to a server-enforced main-only environment guarded by branch protection, adds sigstore provenance to every release, and makes the publish path observable. Round 3 cleanly addresses 11 of 12 round-2 findings, with the new auto-detection tripwire being a real improvement over the marker-file approach it replaced.
The three Important items above (protection-check additions, in-repo runbook, extraction commitment) are all small, mechanical, and worth landing in this PR or in fast-follows so the as-built configuration and the as-tested configuration stay coupled and so the migration to 10 sibling repos starts from a clean baseline.
Next Steps
- Add
dismiss_stale_reviewsandbypass_pull_request_allowanceschecks to the protection-state jq chain. - Lift the rollback procedures into
docs/RELEASE-RUNBOOK.md(include sigstore-timing reality and partial-publish recovery). - File a tracking issue for setup-script + workflow-guard extraction, link from this PR.
- Minor items are optional; the Phase-1 grep silent-pass and audit-log truncated-write cases are the two most worth picking up if you are already touching the file.
| else "OK" end | ||
| ' 2>/dev/null || echo "missing") | ||
|
|
||
| if [ "$PROTECTION_STATE" != "OK" ]; then |
There was a problem hiding this comment.
Important: Protection-state check misses dismiss_stale_reviews and bypass_pull_request_allowances.
Verified attack chain without dismiss_stale_reviews: true:
- Author opens PR, gets approval.
- Author force-pushes the PR branch (allowed by default for PR branches).
- Stale approval is retained, satisfying
required_approving_review_count. - PR merges. Release job publishes attacker code via OIDC with valid sigstore attestation.
bypass_pull_request_allowances is not checked at all. adobe-bot is currently the only entry (required for semantic-release); silent additions to this list would degrade the OIDC trust boundary.
Suggested additions to the jq chain:
elif (.required_pull_request_reviews.dismiss_stale_reviews // false) != true then "weak: dismiss_stale_reviews disabled"
elif ((.required_pull_request_reviews.bypass_pull_request_allowances.users // []) +
(.required_pull_request_reviews.bypass_pull_request_allowances.teams // []) +
(.required_pull_request_reviews.bypass_pull_request_allowances.apps // []) | length) > 1 then "weak: unexpected bypass actors"
| # Renaming the 'npm-publish' GitHub Environment referenced below has the same silent-break | ||
| # failure mode — re-run the setup script after. | ||
| # | ||
| # Trust binding security relies on the 'npm-publish' GitHub Environment, which must be |
There was a problem hiding this comment.
Important: Rollback procedures should live in the repo, not the PR description.
The PR description currently holds the only documented procedures for first-release OIDC failure, dual-write rotation window, trust-binding break-glass (file/env rename), and sigstore break-glass (flip NPM_CONFIG_PROVENANCE - realistic timing is 30-60 minutes given enforce_admins.enabled = true).
For a 3am page, an in-repo docs/RELEASE-RUNBOOK.md is greppable from the checkout the on-call already has open. PR descriptions get lost in archive search and require GitHub auth.
Work is small (lift from the PR description, add the sigstore-timing reality, add a partial-publish recovery note for the case where the 15-minute timeout kills a release mid-way through 22 packages).
Summary
Migrates npm publishing for
adobe/spacecat-sharedfrom a long-livedADOBE_BOT_NPM_TOKENsecret to npm OIDC Trusted Publishers.10.9.4→11.13.0(OIDC requires >= 11.5.1) in bothmain.yamland thesetup-node-npmcomposite actionNPM_TOKEN: ${{ secrets.ADOBE_BOT_NPM_TOKEN }}from the Semantic Release and dry-run steps — publishing now uses OIDC token exchangeid-token: writeto per-job permissions (test job needs it for AWS ECR OIDC + npm OIDC dry-run; release job for publishing) — workflow-level permissions are now{}NPM_CONFIG_PROVENANCE: 'true'to the release job — sigstore provenance attestations are emitted with every publishscripts/setup-npm-trusted-publishers.sh— one-time script to configure 22 published packages on npmjs.com (spacecat-shared-exampletemplate excluded)main.yamlResolves SITES-42702
Prerequisites before merging
The
scripts/setup-npm-trusted-publishers.shscript must be run before this PR is merged by someone authenticated to npm as theadobe-botaccount:npm install -g npm@11.13.0 npm login # authenticate as adobe-bot ./scripts/setup-npm-trusted-publishers.shThe script enforces preflight checks (npm version,
whoami == adobe-bot, registry, workflow file existence) and tracks per-package failures so partial registrations don't silently abort.Rollback plan
ADOBE_BOT_NPM_TOKENshould remain in GitHub repo secrets for at least 2 successful release cycles after merge. If OIDC publishing fails on the first release:git revert <merge-sha>) — re-addsNPM_TOKENto the workflow envnpm trustbindings on npmjs.com are non-blocking for token-based publish, so no npm-side cleanup neededAfter 2 successful releases, rotate the npm token (revoke its publish scope) and remove the GitHub secret.
Review process
This PR was reviewed by 4 specialized personas (SRE, Security, Senior Engineer, QA). All round-1 and round-2 MUST/SHOULD findings within scope have been implemented:
id-token: writescopingspacecat-shared-examplefrom trust listDeferred to follow-up PRs (acknowledged out of scope):
actions/checkout@v6etc.)ADOBE_BOT_GITHUB_TOKENPAT with GitHub App installation tokenon: [push]trigger to specific branchesnpm publish --dry-run --provenancesmoke-test step on PRsTest plan
scripts/setup-npm-trusted-publishers.shlocally to configure all 22 packages as Trusted PublishersNPM_TOKEN)npm view @adobe/<pkg> --json | jq '.dist.attestations')ADOBE_BOT_NPM_TOKENfrom GitHub secrets🤖 Generated with Claude Code