Guidance for coding agents working in llama-web-bridge.
- This repository owns WebGPU bridge source/runtime build for llama.cpp web.
- It publishes versioned assets to
llama-web-bridge-assetsvia workflow. llamadartconsumes those published assets.
Common maintainer sibling layout:
../llamadart
../llamadart-native
../llama-web-bridge
../llama-web-bridge-assets
./scripts/build_bridge.shUseful environment overrides:
LLAMA_CPP_DIRBUILD_DIROUT_DIRCMAKE_BUILD_TYPE
When validating bridge runtime changes locally, keep build/cache output outside the repo so generated wasm artifacts and toolchain caches do not dirty the checkout or hit sandboxed Homebrew/cache paths:
export CCACHE_DIR=/private/tmp/llama_web_bridge_ccache
export EM_CACHE=/private/tmp/llama_web_bridge_emcache
BUILD_DIR=/private/tmp/llama_web_bridge_build MEM64_BUILD_DIR=/private/tmp/llama_web_bridge_build_mem64 OUT_DIR=/private/tmp/llama_web_bridge_dist WEBGPU_BRIDGE_BUILD_MEM64=1 ./scripts/build_bridge.sh- CI build gate:
.github/workflows/ci.yml - Publish workflow:
.github/workflows/publish_assets.yml- Requires
WEBGPU_BRIDGE_ASSETS_PAT - Pushes assets + tag to
llama-web-bridge-assets
- Requires
- Keep runtime bridge code in
js/andsrc/. - Keep publishing logic in workflow only.
- Do not edit assets repository files from here outside publish flow.
After publishing assets tag:
- Update/fetch pinned bridge assets in
llamadart:WEBGPU_BRIDGE_ASSETS_TAG=<tag> ./scripts/fetch_webgpu_bridge_assets.sh - Update docs/changelog in
llamadartif behavior changed.
- For pthread/runtime changes, test a BERT-class embedding model in Chromium
with cross-origin isolation enabled. The regression shape is:
loadModelFromUrl,tokenize,embed, andembedBatchon a host wherenavigator.hardwareConcurrencyis greater than the bridge pthread pool size. - Run the smoke through both direct runtime (
disableWorker: true) and the bridge worker path; both should reportn_threadscapped to the pool size.