ARIA Protocol

Autonomous Responsible Intelligence Architecture

One protocol. Every CPU. Every NPU. Every model.

ARIA is a universal distributed inference protocol. A single peer-to-peer network routes queries to the right model on the right hardware, whether that's a 1.58-bit ternary model running on a low-power laptop, a standard 4B GGUF model on a desktop, or a code/reasoning/vision specialist on a machine with a real NPU. Nodes advertise the tiers they serve in a v2 handshake and the smart router matches every query to a peer that can answer it.

Three tiers, one protocol

Tier	Models	Backend	Memory floor	Use case
🌱 Efficiency	BitNet b1.58, Falcon-E, Falcon3 1.58-bit (8 models)	`bitnet.cpp`	0.4 GB	Always-on chat on any CPU; low-power laptops; background nodes
⚡ Quality	Gemma 4, Qwen 3.5, SmolLM3, Phi-4 mini (5 models)	mainline `llama.cpp`	1.1 GB	Multilingual chat, longer context, multimodal (vision+audio)
🛠️ Specialist	Qwen2.5-Coder, DeepSeek-R1-Distill, MiniCPM-V (3 models)	mainline `llama.cpp`	1.9 GB	Code generation, chain-of-thought reasoning, vision

A node operator picks one of five profile presets — minimal, efficient (default), balanced, full, or specialist_only — and the profile decides which tiers light up. The default keeps nodes lean (efficiency only); balanced adds Quality on machines with 8 GB+ RAM; full enables all three tiers on 16 GB+ workstations.

See docs/MODELS.md for the full catalog, license matrix, and HuggingFace URLs.

Quick start

Install

pip install aria-protocol

Start a node

# Default profile is "efficient" — 1.58-bit only, ~4 GB RAM, any CPU
aria node start --port 8765

# OpenAI-compatible API
aria api start --port 3000

Load a model

# Tier Efficiency (default)
aria model download BitNet-b1.58-2B-4T

# Tier Quality (after switching profile)
aria node profile set balanced
aria model download Gemma-4-E2B

# Tier Specialist (after switching profile)
aria node profile set full
aria model download Qwen2.5-Coder-7B-Instruct

Use with the OpenAI client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3000/v1", api_key="aria")

response = client.chat.completions.create(
    model="BitNet-b1.58-2B-4T",
    messages=[{"role": "user", "content": "What is quantum computing?"}],
)
print(response.choices[0].message.content)

The smart router picks a model automatically when none is specified — pass the catalog ID (BitNet-b1.58-2B-4T, Gemma-4-E2B, Qwen2.5-Coder-7B-Instruct, …) when you want a specific one.

For the full walkthrough see docs/getting-started.md.

Supported models

ARIA v0.9.0 ships with 16 models across three tiers. Every entry passes a strict license gate at import — the catalog only contains models under MIT, Apache 2.0, or TII Falcon 2.0 licenses to keep P2P redistribution friction-free. Models considered and rejected on licensing grounds (Llama 3.x, Gemma 3, Mistral research, Yi, Command-R) are listed in docs/MODELS.md with the rejection reasoning.

Tier	# models	License surface
Efficiency	8	MIT, TII Falcon 2.0
Quality	5	Apache 2.0, MIT
Specialist	3	Apache 2.0, MIT

Adding a model is a pull request against aria/model_catalog.py. The gate refuses non-permissive licenses at import time so the roster cannot drift.

Full table: docs/MODELS.md.

Hardware

ARIA detects the local CPU/NPU at startup and ships the snapshot in the v2 peer hello so remote routers can prefer hardware-friendly peers:

aria hardware info

v0.9.0 ships NPU detection only — the protocol learns about AMD XDNA/XDNA2, Intel NPU, Qualcomm Hexagon, and Apple ANE devices, but inference itself still runs on the CPU. Real NPU acceleration ships in v1.0 via vendor-specific stubs (OpenVINO for Intel, QNN for Qualcomm, Core ML for Apple).

CPU detection covers Intel/AMD/Apple Silicon/Qualcomm Snapdragon, including AVX-512 capability used by bitnet.cpp for native 512-bit ternary kernels.

See docs/NPU_SUPPORT.md for the per-vendor roadmap and how to verify detection on your machine.

Benchmarks

Real numbers, reproducible from the repo. All measurements on a single host so they're comparable to each other; cross-host comparisons should treat the absolute throughput as indicative.

v0.5.5 — Ecosystem benchmark (Zen 4)

Hardware: AMD Ryzen 9 7845HX (12C/24T, Zen 4, 64 GB DDR5) Build: bitnet.cpp + Clang 20.1.8, AVX-512 VNNI+VBMI enabled Protocol: 8 threads, 256 tokens, 5 runs per model, median selected

Model	Params	Type	tok/s
BitNet-b1.58-large	0.7B	post-quantized	118.25
Falcon-E-1B-Instruct	1.0B	native 1-bit	80.19
Falcon3-1B-Instruct	1.0B	post-quantized	56.31
Falcon-E-3B-Instruct	3.0B	native 1-bit	49.80
BitNet-b1.58-2B-4T	2.4B	native 1-bit	37.76
Falcon3-3B-Instruct	3.0B	post-quantized	33.21
Falcon3-7B-Instruct	7.0B	post-quantized	19.89
Falcon3-10B-Instruct	10.0B	post-quantized	15.12

Key finding: Models natively trained in 1-bit (Falcon-E) outperform post-training quantized models by +42% at 1B and +50% at 3B on identical hardware. Native ternary training matters more than absolute parameter count below 7B.

Zen 5 cross-generation (April 2026)

Benchmarked on AMD Ryzen AI 9 HX 370 (Zen 5, native 512-bit AVX-512). Average improvement: +35% across 7 models.

Model	Zen 4 (t/s)	Zen 5 (t/s)	Δ
Falcon-E-1B	80.19	103.59	+29%
Falcon3-1B	56.31	78.16	+39%
BitNet-2B-4T	37.76	51.82	+37%
Falcon-E-3B	49.80	65.19	+31%
Falcon3-3B	33.21	46.77	+41%
Falcon3-7B	19.89	28.45	+43%
Falcon3-10B	15.12	19.39	+28%

Big.LITTLE CPUs require model-size-aware thread tuning: 1B peaks at 6 threads, 7B peaks at 20 threads.

Full results and reproduction harness: benchmarks/.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     ARIA PROTOCOL v0.9.0                          │
├──────────────────────────────────────────────────────────────────┤
│  SERVICE     OpenAI-compatible API · Desktop App · CLI · Dashboard│
├──────────────────────────────────────────────────────────────────┤
│  CONSENSUS   Provenance Ledger · Proof of Useful Work · Proof of  │
│              Sobriety · Consent Contracts                         │
├──────────────────────────────────────────────────────────────────┤
│  COMPUTE     SmartRouterV2 → ┬→ BitnetBackend  (port 8081)        │
│                              └→ LlamacppBackend (port 8082)       │
│              P2P Network (WebSocket, Kademlia DHT, NAT traversal, │
│              Ed25519 auth, Protocol v2 with tier capabilities)    │
└──────────────────────────────────────────────────────────────────┘

The router is a pure function: it takes a query, runs it through the classifier, picks (tier, model_id) from a routing table, and returns a RoutingDecision with a fallback chain. The two backends — one wrapping bitnet.cpp's llama-server, one wrapping mainline llama.cpp's llama-server — are independent processes the router dispatches to by model ID.

Detailed view: docs/architecture.md. P2P wire format: docs/protocol-spec.md.

Documentation

Document	Description
Getting Started	Install, first node, per-tier examples
Models	Full catalog, license matrix, HuggingFace URLs
NPU Support	Per-vendor roadmap (AMD / Intel / Qualcomm / Apple)
Architecture	Three-tier compute, dual backend, P2P v2
Protocol Spec	WebSocket protocol v2 with HELLO message
Migration v0.8 → v0.9	Breaking changes and how to upgrade
API Reference	OpenAI-compatible HTTP endpoints
Threat Model	Security analysis, tier-specific threats
Security Architecture	Defense-in-depth model
Smart Router	Routing table, classifier, fallback logic
Benchmarks	Methodology and full result sets
Roadmap	All versions and tasks

Desktop app

Download latest release — Windows, macOS (Intel + Apple Silicon), Linux.

Built with Electron (primary) and Tauri 2.0 (alternative). Includes a 3-tier badge in the header, profile preset switcher, hardware panel, chat-centric interface, and 4 layout presets. Mode switch separates Chat (AI conversations) from Node (Dashboard, Models, Energy, Network).

See desktop/README.md for build instructions.

Why ARIA?

Problem	Solution
AI requires expensive GPUs	1.58-bit and 4-bit models run efficiently on any CPU
One model can't cover every workload	Three tiers (efficiency / quality / specialist) routed per query
Centralized inference burns energy	Distributed across existing consumer devices with sobriety proofs
Outputs are untraceable	Every inference recorded on the provenance ledger
Models depend on one provider	8+ organizations contribute to the catalog (Microsoft, TII, Google, Alibaba, DeepSeek, OpenBMB, HuggingFace, Microsoft Research)
Licenses leak into redistribution	Hard license gate — only MIT / Apache 2.0 / TII Falcon 2.0 enter the catalog

Contributing

Pull requests welcome. Areas where help is most useful:

NPU stubs — wiring real inference for AMD XDNA, Intel NPU, Qualcomm Hexagon, or Apple ANE under aria/backends/
New models — add a ModelEntry to aria/model_catalog.py (the license gate enforces P2P-compatible licenses at import)
Routing improvements — better classifiers or routing tables in aria/smart_router.py
Mobile — React Native or native iOS/Android app
Docs and examples — every example in examples/ is welcome

Development setup

git clone https://github.com/spmfrance-cloud/aria-protocol.git
cd aria-protocol
pip install -e ".[dev]"
make test

Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/your-change)
Write tests for your changes
Ensure the suite passes (make test)
Open a pull request with a clear description

Code style: PEP 8, type hints on public APIs, focused functions, tests alongside the change.

Running tests

make test               # full suite
make test-verbose       # verbose output
make test-cov           # with coverage report
pytest tests/test_smart_router.py -v

License

MIT. See LICENSE.

Citation

@misc{aria2026,
  author = {Anthony MURGO},
  title  = {ARIA: Autonomous Responsible Intelligence Architecture},
  year   = {2026},
  url    = {https://github.com/spmfrance-cloud/aria-protocol}
}

Acknowledgments

Microsoft Research BitNet — 1.58-bit ternary research and bitnet.cpp
TII Falcon — Falcon-Edge and Falcon3 1.58-bit families
ggml-org — mainline llama.cpp and Gemma 4 GGUF builds
Qwen team — Qwen 3.5 and Qwen2.5-Coder
DeepSeek — R1 distilled reasoning weights
OpenBMB — MiniCPM-V vision specialist

One protocol. Every CPU. Every NPU. Every model.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github/workflows		.github/workflows
aria		aria
benchmarks		benchmarks
desktop		desktop
docs		docs
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
ARIA_Whitepaper.pdf		ARIA_Whitepaper.pdf
ARIA_Whitepaper_v2.pdf		ARIA_Whitepaper_v2.pdf
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARIA Protocol

Three tiers, one protocol

Quick start

Install

Start a node

Load a model

Use with the OpenAI client

Supported models

Hardware

Benchmarks

v0.5.5 — Ecosystem benchmark (Zen 4)

Zen 5 cross-generation (April 2026)

Architecture

Documentation

Desktop app

Why ARIA?

Contributing

Development setup

Guidelines

Running tests

License

Citation

Acknowledgments

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARIA Protocol

Three tiers, one protocol

Quick start

Install

Start a node

Load a model

Use with the OpenAI client

Supported models

Hardware

Benchmarks

v0.5.5 — Ecosystem benchmark (Zen 4)

Zen 5 cross-generation (April 2026)

Architecture

Documentation

Desktop app

Why ARIA?

Contributing

Development setup

Guidelines

Running tests

License

Citation

Acknowledgments

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages