Build llama.cpp Binaries Workflow - Usage Guide

Overview

The build-llamacpp-binaries.yml workflow automates the building of llama.cpp binaries for use in GitHub Actions workflows. It builds statically-linked binaries optimized for GitHub Actions runners on both macOS and Ubuntu platforms.

Built Binaries

The workflow builds the following 5 binaries:

llama-cli - Command-line interface for inference
llama-quantize - Model quantization tool
llama-server - HTTP server for model serving
llama-mtmd-cli - Multimodal (MTMD) interface
llama-gguf - GGUF file inspection and manipulation tool (useful for debugging)

Note: llama-run was deprecated after llama.cpp release b6808 and is only built for versions b6808 and older.

Trigger Methods

1. Manual Dispatch (Recommended)

Navigate to the Actions tab and manually trigger the workflow:

Go to Actions → Build llama.cpp Binaries
Click Run workflow
Configure inputs:
- llama_cpp_version: Git commit hash or tag (e.g., b8100, b6808, master)
- release_tag: (Optional) Release tag to upload binaries as release assets
- debug: Enable verbose debug output
- minimize_acceleration: Disable all hardware acceleration for maximum compatibility (default: false)
- openssl_static: Force static linking of OpenSSL to avoid runtime dependencies on libssl/libcrypto (default: false)

2. Automatic on Release

The workflow automatically runs when a new GitHub release is created or published. Binaries are automatically attached to the release as assets.

3. Scheduled Builds

The workflow runs automatically every Sunday at 2 AM UTC to keep binaries up-to-date with the latest llama.cpp version.

Downloading Built Binaries

From Workflow Artifacts

Navigate to the workflow run in the Actions tab
Scroll to the Artifacts section at the bottom
Download the zip file:
- llama-cpp-binaries-{version}.zip
Artifacts are retained for 90 days

From Release Assets

If the workflow was triggered with a release tag or on a release event:

Go to the Releases page
Find the appropriate release
Download the zip files from the Assets section
Release assets are permanent

Extracting and Using Binaries

Extract to bin/ Directory

# Download and extract the zip file
unzip llama-cpp-b6808.zip -d bin/

# Make binaries executable
chmod +x bin/llama-*

Archive Old Binaries (Optional)

# Create archive directory with date
mkdir -p bin/archive/$(date +%Y-%m-%d)

# Move old binaries to archive
mv bin/llama-* bin/archive/$(date +%Y-%m-%d)/

# Extract new binaries
unzip llama-cpp-macOS-b6808.zip -d bin/

Build Configuration

CMake Configuration Flags

The workflow uses the following CMake flags for maximum compatibility across GitHub Actions runners:

cmake -B build \
  -DBUILD_SHARED_LIBS=OFF \          # Static linking for portability
  -DOPENSSL_USE_STATIC_LIBS=ON \     # Force static OpenSSL (optional, controlled by openssl_static input)
  -DGGML_METAL=OFF \                 # Disable Metal GPU acceleration
  -DGGML_NATIVE_DEFAULT=OFF \        # Disable native CPU optimizations
  -DCMAKE_CROSSCOMPILING=TRUE \      # Cross-compilation compatibility mode
  -DGGML_NO_ACCELERATE=ON \          # Disable Apple Accelerate framework
  -DGGML_SVE=OFF \                   # Disable ARM SVE instructions
  -DCMAKE_BUILD_TYPE=Release \       # Optimized release build
  -DCMAKE_OSX_ARCHITECTURES=arm64 \  # Target ARM64 architecture
  -DCMAKE_OSX_DEPLOYMENT_TARGET=15.0 # Minimum macOS version

Complete CMake Configuration Flags Reference

This section documents all CMake configuration flags used in the workflow, organized by category.

Core Build Configuration

Flag	Default	Description	When to Use
`-DCMAKE_BUILD_TYPE=Release`	Release	Build type (Release/Debug/RelWithDebInfo)	Always use Release for production binaries
`-DBUILD_SHARED_LIBS=OFF`	OFF	Static vs dynamic linking	Keep OFF for portable binaries
`-DCMAKE_CROSSCOMPILING=TRUE`	FALSE	Enable cross-compilation mode	Required when building for different architecture
`-DLLAMA_BUILD_EXAMPLES=ON`	ON	Build example binaries (llama-cli, etc.)	Keep ON to build the binaries we need

Platform-Specific Configuration (macOS)

Flag	Value	Description	When to Use
`-DCMAKE_OSX_ARCHITECTURES=arm64`	arm64	Target architecture	Building for Apple Silicon (M1/M2/M3/M4)
`-DCMAKE_OSX_DEPLOYMENT_TARGET=15.0`	15.0	Minimum macOS version	Set to lowest macOS version you need to support
`-DCMAKE_SYSTEM_NAME=Darwin`	Darwin	Target operating system	Required for macOS cross-compilation
`-DCMAKE_SYSTEM_PROCESSOR=arm64`	arm64	Target processor architecture	Must match CMAKE_OSX_ARCHITECTURES

Compiler Flags

Flag	Value	Description	When to Use
`-DCMAKE_C_FLAGS="-march=armv8-a -mtune=generic"`	Custom	C compiler flags	For maximum ARM64 compatibility
`-DCMAKE_CXX_FLAGS="-march=armv8-a -mtune=generic"`	Custom	C++ compiler flags	Applied when minimize_acceleration is enabled

Compiler Flag Explanation:

-march=armv8-a: Target baseline ARMv8-A instruction set (compatible with ALL ARM64 CPUs)
-mtune=generic: Optimize for generic ARM processors (not chip-specific like M1/M2/M3/M4)
Result: Binaries work on any ARMv8-A system (Apple Silicon, AWS Graviton, etc.) with reasonable performance

Hardware Acceleration Flags

Flag	Default	Description	When to Disable
`-DGGML_METAL=OFF`	ON	Apple Metal GPU acceleration	Always OFF for CI/portability
`-DGGML_METAL_EMBED_LIBRARY=OFF`	ON	Embed Metal library in binary	When minimize_acceleration is enabled
`-DGGML_ACCELERATE=OFF`	ON	Apple Accelerate framework (BLAS)	When minimize_acceleration is enabled
`-DGGML_BLAS=OFF`	OFF	Generic BLAS support	When minimize_acceleration is enabled
`-DGGML_NATIVE_DEFAULT=OFF`	ON	Native CPU optimizations	Always OFF for portability
`-DGGML_NO_ACCELERATE=ON`	OFF	Explicitly disable Accelerate	Always ON for CI builds
`-DGGML_SVE=OFF`	OFF	ARM SVE (Scalable Vector Extension)	Always OFF for baseline ARM64

GPU/Accelerator Backend Flags

These flags control various GPU and accelerator backends. All are disabled in the workflow for maximum compatibility:

Flag	Default	Description	Workflow Setting
`-DGGML_CUDA=OFF`	OFF	NVIDIA CUDA support	Always OFF (no CUDA on macOS)
`-DGGML_VULKAN=OFF`	OFF	Vulkan GPU support	Always OFF for portability
`-DGGML_KOMPUTE=OFF`	OFF	Kompute GPU compute	Always OFF for portability
`-DGGML_SYCL=OFF`	OFF	Intel SYCL support	Always OFF for portability

OpenSSL Configuration

Flag	Default	Description	When to Enable
`-DOPENSSL_USE_STATIC_LIBS=ON`	OFF	Force static OpenSSL linking	Enable via `openssl_static` input when you get OpenSSL dynamic library errors

Important: This flag is controlled by the openssl_static workflow input parameter. Enable it if you encounter errors like:

dyld: Library not loaded: /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib
dyld: Library not loaded: /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib

Workflow Input Parameters

The workflow provides several input parameters that control which CMake flags are applied:

Input Parameter	Type	Default	Effect
`llama_cpp_version`	string	b8100	llama.cpp version to build (commit hash or tag)
`release_tag`	string	(empty)	Optional release tag for uploading binaries
`debug`	boolean	false	Enable verbose debug output in workflow
`minimize_acceleration`	boolean	false	Disable ALL hardware acceleration (adds GGML_METAL_EMBED_LIBRARY, GGML_ACCELERATE, GGML_BLAS, GGML_CUDA, GGML_VULKAN, GGML_KOMPUTE, GGML_SYCL flags)
`openssl_static`	boolean	false	Force static linking of OpenSSL (adds OPENSSL_USE_STATIC_LIBS flag)

Flag Combinations and Use Cases

Standard Build (Default)

# Input parameters: (all defaults)
# Result: Portable binaries with basic optimizations
cmake -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_METAL=OFF \
  -DGGML_NATIVE_DEFAULT=OFF \
  -DGGML_NO_ACCELERATE=ON \
  -DGGML_SVE=OFF

Maximum Compatibility Build

# Input parameters: minimize_acceleration=true
# Result: Works on any ARM64 system, no hardware dependencies
cmake -B build \
  [standard flags...] \
  -DGGML_METAL_EMBED_LIBRARY=OFF \
  -DGGML_ACCELERATE=OFF \
  -DGGML_BLAS=OFF \
  -DGGML_CUDA=OFF \
  -DGGML_VULKAN=OFF \
  -DGGML_KOMPUTE=OFF \
  -DGGML_SYCL=OFF

Fully Portable Build (No External Dependencies)

# Input parameters: openssl_static=true
# Result: No runtime dependencies on system libraries
cmake -B build \
  [standard flags...] \
  -DOPENSSL_USE_STATIC_LIBS=ON

Why These Flags Matter

These flags are specifically chosen to prevent common issues when running binaries across different systems:

Prevents "Illegal instruction" errors: Older systems may not support advanced SIMD instructions (AVX512, SVE, etc.)
Ensures cross-platform compatibility: Binaries work on both local development machines and CI environments
Avoids GPU-specific code: GitHub runners typically don't have GPU access or Metal support
Eliminates runtime dependencies: Static linking ensures binaries work without requiring specific libraries
Maximizes portability: Binaries can run on a wide range of ARM64 systems without modification

Understanding Static vs Dynamic Linking

Dynamic Linking (Default for OpenSSL):

Binary depends on system-installed libraries (e.g., libssl.dylib, libcrypto.dylib)
Smaller binary size
Requires matching library versions on target system
Can cause "Library not loaded" errors if libraries are missing or incompatible

Static Linking (With -DOPENSSL_USE_STATIC_LIBS=ON):

All library code embedded in binary
Larger binary size
No external dependencies
Works on any system regardless of installed libraries
Recommended for maximum portability

Note: -DBUILD_SHARED_LIBS=OFF only controls your project's libraries, not external dependencies like OpenSSL. That's why -DOPENSSL_USE_STATIC_LIBS=ON is a separate flag.

Integration with Existing Workflows

Existing workflows in this repository use binaries from the bin/ directory:

- name: quantize-gguf
  run: |
    sudo ./bin/llama-quantize ${{ env.LOCAL_FNAME_CONVERTED_GGUF }} ...

After extracting new binaries to bin/, existing workflows will automatically use them without modification.

Troubleshooting

Binary Not Found Error

If you get "binary not found" errors:

Verify the binary was extracted to the correct location
Check file permissions: ls -l bin/llama-*
Make executable if needed: chmod +x bin/llama-*

Build Failures

If the workflow fails:

Check the workflow logs for specific error messages
Verify the llama.cpp version exists (commit hash or tag)
Enable debug mode for more verbose output
Check if llama.cpp has breaking changes in the specified version

Smoke Test Failures

If smoke tests fail but binaries built successfully:

The binary may not support --version flag
Check if the binary requires specific arguments
Review the smoke test output in workflow logs

Version Management

Recommended Versions

Production: Use stable release tags (e.g., b8100, b6808)
Testing: Use specific commit hashes for reproducibility
Latest: Use master branch (not recommended for production)

Version-Specific Notes

b6808 and older: Includes llama-run binary (deprecated in later versions)
b6809 and newer: llama-run is no longer built; use llama-cli instead
All versions: Include llama-gguf tool for GGUF file inspection and debugging

Updating to New Versions

Run the workflow with the new version
Download and test the binaries locally
Extract to bin/ directory
Run existing workflows to verify compatibility
Archive old binaries for rollback if needed

Best Practices

Test before deploying: Always test new binaries with your workflows before committing to bin/
Archive old versions: Keep previous binaries in bin/archive/ for quick rollback
Document versions: Note which llama.cpp version is in use in commit messages
Regular updates: Run scheduled builds to stay current with llama.cpp improvements
Release tagging: Use release tags for important binary updates

Example: Complete Update Process

# 1. Trigger workflow manually with version b8100
# (Use GitHub Actions UI)
# Optional: Enable openssl_static if you need fully portable binaries

# 2. Download artifact after workflow completes
cd ~/Downloads
unzip llama-cpp-b8100-darwin-arm64.zip

# 3. Navigate to repository
cd /path/to/gguf

# 4. Archive old binaries
mkdir -p bin/archive/$(date +%Y-%m-%d)
mv bin/llama-* bin/archive/$(date +%Y-%m-%d)/

# 5. Copy new binaries
cp ~/Downloads/llama-* bin/

# 6. Verify binaries
ls -lh bin/llama-*
./bin/llama-cli --version

# 7. Test with existing workflow
# (Run a quantization workflow to verify)

# 8. Test llama-gguf for debugging (optional)
./bin/llama-gguf --help

# 9. Commit changes
git add bin/
git commit -m "Update llama.cpp binaries to version b8100"
git push

Support

For issues or questions:

Check workflow logs in the Actions tab
Review this documentation
Consult the llama.cpp repository
Open an issue in this repository

FilesExpand file tree

build-llamacpp-workflow-usage.md

Latest commit

History