Skip to content

dineshsoudagar/local-llms-on-android

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

132 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Pocket LLM for Android (Offline, Private & Fast)

An Android application that brings local LLM chat to your phone, fully on-device, private, and fast.

It supports ONNX-based Qwen models and LiteRT-based Qwen 3 and Gemma 4 models, with streaming responses, persistent local chat history, markdown-rendered replies, downloadable on-device models, and in-app model switching.

Unlike older builds, the app now ships as a small base APK. Users choose and download one or more models after installation, then switch between them later from inside the app.


Total APK downloads


🆕 New in v1.4.0

  • Switched to a smaller base APK
  • Models are no longer bundled inside the APK
  • Added first-launch model selection with on-device model downloads
  • Added support for downloading multiple models
  • Added support for switching between downloaded models inside the app
  • Retained thinking text in a collapsible section for supported models
  • Added basic UI and usability improvements

🔗 Also Check Out

local-document-intelligence
A privacy-first offline document intelligence system with persistent local RAG, hybrid retrieval, and source-grounded answers.


✨ Features

  • 📱 Fully on-device LLM inference for private offline use
  • 📦 Smaller base APK with on-demand model downloads
  • 📥 First-launch model picker after install
  • 🔁 Download multiple models and switch between them inside the app
  • 🧠 Supports Qwen2.5, Qwen3, Qwen3 LiteRT, Gemma 4 E2B, and Gemma 4 E4B
  • ⚡ Supports ONNX and LiteRT backends
  • 🚀 Hardware-accelerated LiteRT inference on supported devices
  • 🔤 Hugging Face-compatible tokenizer support for ONNX Qwen models
  • 💬 Persistent multi-turn chat with local history and Previous Chats
  • 📝 Markdown rendering for assistant replies, including table support
  • 🤔 Thinking Mode for supported models
  • 🗂️ Retained thinking text in a collapsible section
  • 🎨 Built-in themes and adjustable chat font size
  • 🛑 Stop-generation support
  • 🔐 Offline after model download, with no telemetry

📸 Inference Preview

Model Output 1
Chat inference
Model Output 2
Theme switching
Chat UI Preview
Thinking mode

Figure: App interface showing local LLM chat and streaming responses on Android.


📦 Download APK - v1.4.0

The app now ships as a single smaller base APK.

Models are not bundled inside the APK anymore. After installation, the app prompts the user to select a model and download it directly on device.

Users can download multiple models and switch between them later from inside the app, without reinstalling.

Available models

  • Gemma 4 E4B LiteRT - Best for flagship mobiles
  • Gemma 4 E2B LiteRT - Best for decent to mid-range mobiles
  • Qwen3 0.6B LiteRT - Best for low-end mobiles
  • Qwen3 0.6B Q4F16 ONNX - Good for low to mid-range mobiles
  • Qwen2.5 0.5B ONNX - Best for mid to high-end mobiles, full precision

Note: internet is required only for downloading models. Chat and inference remain fully on-device after the model is installed.


🧠 Backend Support

This app supports ONNX-based Qwen models and LiteRT-based Qwen 3 and Gemma 4 models.

Backend overview

  • ONNX backend: supports Qwen2.5 and Qwen3
  • LiteRT backend: supports Qwen3 and Gemma 4

Thinking Mode

  • Qwen3 and Gemma 4 support Thinking Mode
  • The toggle is shown only for models that support it

🚀 Why LiteRT

LiteRT is a strong fit for fast local Android chat because:

  • It is designed for high-performance on-device LLM deployment
  • It supports hardware acceleration, including GPU and NPU acceleration on supported devices
  • It helps reduce startup and generation latency for local chat workloads
  • It expands the range of practical Android model builds beyond a single backend path
  • It fits well with a privacy-first app design focused on fully offline usage

Note: model capability and performance still depend on the specific model build and the hardware of the target Android device.


⚙️ Requirements

  • Android Studio
  • A physical Android device for deployment and testing
  • 4 GB or more RAM for smaller models
  • More RAM is recommended for larger models such as Gemma 4 E2B and Gemma 4 E4B
  • A temporary internet connection for downloading models inside the app
  • Real hardware is preferred; emulators are mainly useful for UI checks

🚀 How to Build & Run

  1. Clone this repository.

  2. Install the latest Android Studio.

  3. Open the Android project folder in Android Studio:

    pocket_llm_src/
    
  4. Build and install the app on your Android device.

  5. Launch the app.

  6. On first launch, choose a model from the built-in model picker.

  7. Download the selected model directly inside the app.

  8. Start chatting locally on device


📄 License Notice

Gemma 4

Gemma 4 is provided by Google under the Apache License 2.0. Google's Gemma documentation also states that Gemma models are provided with open weights and support responsible commercial use.

Qwen models

Qwen model files follow the upstream Qwen license terms.
Please review the original model license before redistribution or commercial use.