🤖 Pocket LLM for Android (Offline, Private & Fast)

An Android application that brings local LLM chat to your phone, fully on-device, private, and fast.

It supports ONNX-based Qwen models and LiteRT-based Qwen 3 and Gemma 4 models, with streaming responses, persistent local chat history, markdown-rendered replies, downloadable on-device models, and in-app model switching.

Unlike older builds, the app now ships as a small base APK. Users choose and download one or more models after installation, then switch between them later from inside the app.

🆕 New in v1.4.0

Switched to a smaller base APK
Models are no longer bundled inside the APK
Added first-launch model selection with on-device model downloads
Added support for downloading multiple models
Added support for switching between downloaded models inside the app
Retained thinking text in a collapsible section for supported models
Added basic UI and usability improvements

➡️ See all releases

🔗 Also Check Out

local-document-intelligence
A privacy-first offline document intelligence system with persistent local RAG, hybrid retrieval, and source-grounded answers.

✨ Features

📱 Fully on-device LLM inference for private offline use
📦 Smaller base APK with on-demand model downloads
📥 First-launch model picker after install
🔁 Download multiple models and switch between them inside the app
🧠 Supports Qwen2.5, Qwen3, Qwen3 LiteRT, Gemma 4 E2B, and Gemma 4 E4B
⚡ Supports ONNX and LiteRT backends
🚀 Hardware-accelerated LiteRT inference on supported devices
🔤 Hugging Face-compatible tokenizer support for ONNX Qwen models
💬 Persistent multi-turn chat with local history and Previous Chats
📝 Markdown rendering for assistant replies, including table support
🤔 Thinking Mode for supported models
🗂️ Retained thinking text in a collapsible section
🎨 Built-in themes and adjustable chat font size
🛑 Stop-generation support
🔐 Offline after model download, with no telemetry

📸 Inference Preview

_{Chat inference}

_{Theme switching}

_{Thinking mode}

Figure: App interface showing local LLM chat and streaming responses on Android.

📦 Download APK - v1.4.0

The app now ships as a single smaller base APK.

➡️ Download APK

Models are not bundled inside the APK anymore. After installation, the app prompts the user to select a model and download it directly on device.

Users can download multiple models and switch between them later from inside the app, without reinstalling.

Available models

Gemma 4 E4B LiteRT - Best for flagship mobiles
Gemma 4 E2B LiteRT - Best for decent to mid-range mobiles
Qwen3 0.6B LiteRT - Best for low-end mobiles
Qwen3 0.6B Q4F16 ONNX - Good for low to mid-range mobiles
Qwen2.5 0.5B ONNX - Best for mid to high-end mobiles, full precision

Note: internet is required only for downloading models. Chat and inference remain fully on-device after the model is installed.

🧠 Backend Support

This app supports ONNX-based Qwen models and LiteRT-based Qwen 3 and Gemma 4 models.

Backend overview

ONNX backend: supports Qwen2.5 and Qwen3
LiteRT backend: supports Qwen3 and Gemma 4

Thinking Mode

Qwen3 and Gemma 4 support Thinking Mode
The toggle is shown only for models that support it

🚀 Why LiteRT

LiteRT is a strong fit for fast local Android chat because:

It is designed for high-performance on-device LLM deployment
It supports hardware acceleration, including GPU and NPU acceleration on supported devices
It helps reduce startup and generation latency for local chat workloads
It expands the range of practical Android model builds beyond a single backend path
It fits well with a privacy-first app design focused on fully offline usage

Note: model capability and performance still depend on the specific model build and the hardware of the target Android device.

⚙️ Requirements

Android Studio
A physical Android device for deployment and testing
4 GB or more RAM for smaller models
More RAM is recommended for larger models such as Gemma 4 E2B and Gemma 4 E4B
A temporary internet connection for downloading models inside the app
Real hardware is preferred; emulators are mainly useful for UI checks

🚀 How to Build & Run

Clone this repository.
Install the latest Android Studio.
Open the Android project folder in Android Studio:
```
pocket_llm_src/
```
Build and install the app on your Android device.
Launch the app.
On first launch, choose a model from the built-in model picker.
Download the selected model directly inside the app.
Start chatting locally on device

📄 License Notice

Gemma 4

Gemma 4 is provided by Google under the Apache License 2.0. Google's Gemma documentation also states that Gemma models are provided with open weights and support responsible commercial use.

Gemma 4 license: https://ai.google.dev/gemma/apache_2
Gemma 4 overview: https://ai.google.dev/gemma/docs/core

Qwen models

Qwen model files follow the upstream Qwen license terms.
Please review the original model license before redistribution or commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
data		data
pocket_llm_src		pocket_llm_src
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poket_llm_logo.png		poket_llm_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Pocket LLM for Android (Offline, Private & Fast)

🆕 New in v1.4.0

➡️ See all releases

🔗 Also Check Out

✨ Features

📸 Inference Preview

📦 Download APK - v1.4.0

➡️ Download APK

Available models

🧠 Backend Support

Backend overview

Thinking Mode

🚀 Why LiteRT

⚙️ Requirements

🚀 How to Build & Run

📄 License Notice

Gemma 4

Qwen models

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Pocket LLM for Android (Offline, Private & Fast)

🆕 New in v1.4.0

➡️ See all releases

🔗 Also Check Out

✨ Features

📸 Inference Preview

📦 Download APK - v1.4.0

➡️ Download APK

Available models

🧠 Backend Support

Backend overview

Thinking Mode

🚀 Why LiteRT

⚙️ Requirements

🚀 How to Build & Run

📄 License Notice

Gemma 4

Qwen models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages