A RAG-Powered Conversational AI for Ayachi Nene
"メンカタカラメヤサイダブルニンニクアブラマシマシ!"
A RAG conversational AI combining FAISS semantic retrieval with pluggable LLM backends (Claude, DeepSeek, or local Ollama). Features multi-turn session memory, SSE token streaming, a modern Vue 3 immersive Galgame UI, and similarity-threshold filtering for hallucination-free character reproduction.
中文文档 (Chinese Version) | Vision | Features | Quick Start | Architecture | FAQ
Traditional AI role-playing bots often suffer from two fatal flaws: "Hallucinations" (making up fake lore) and "OOC" (Out of Character responses). While conventional Fine-tuning can help, it is hardware-intensive and rarely eradicates these issues completely.
NeneBot is an attempt to bring RAG (Retrieval-Augmented Generation) architecture to Galgame character simulation:
- External Memory Engine: By slicing and vectorizing the original script of Sanoba Witch, we give the AI "true" memories.
- Authentic Reproduction: The LLM is forced to reference retrieved original dialogue, perfectly capturing Nene's gentle and shy personality.
- Pluggable LLM Backend: Swap between local Ollama and cloud APIs (Claude, DeepSeek) with a single environment variable — no code changes required.
- Ultimate Front-end Aesthetics: Ditching clunky terminal interfaces for an immersive, modern visual novel (Galgame) UI.
- Flexible LLM Backend: Use local Ollama (Qwen 2.5, Llama, etc.) for full privacy, or plug in a cloud API (Claude, DeepSeek) via a single
LLM_PROVIDERenv var for higher quality. - Multi-Turn Memory: Per-session conversation history (sliding window) keeps Nene contextually aware across turns.
- Real-Time Token Streaming: SSE-based streaming delivers a native typewriter effect — responses appear word by word.
- Millisecond Semantic Retrieval: Utilizes Meta's FAISS vector database alongside the
bge-small-zhembedding model to pinpoint relevant historical scripts. - Threshold Fallback Mechanism: Features a custom
match_thresholdfilter (cosine similarity, default0.55). If the topic is unfamiliar, Nene seamlessly transitions to zero-shot character playing rather than forcing irrelevant memories. - Immersive Visual Experience: A stunning Vue 3 + Vite front-end featuring a dark glassmorphism UI, typewriter effects, and dynamic breathing layouts.
- Out-of-the-Box Automation: Includes 1-click installation and startup scripts for both Windows and Linux. No terminal anxiety required.
Deploy to Railway in under 5 minutes. Users only need a browser URL.
- Fork this repo and connect it to Railway.
- In Railway's Variables panel, set:
LLM_PROVIDER=deepseek OPENAI_COMPAT_API_KEY=sk-... - Railway builds the frontend, installs deps, and starts the server automatically.
- Share the generated
*.railway.appURL — done.
Before starting locally, decide which LLM backend you want to use:
- Local Ollama: zero API cost, fully local, recommended for offline/private use.
- Cloud API (Claude / DeepSeek / OpenAI-compatible): better quality and easier setup on lower-end machines.
Create a local .env file before your first run:
cp .env.example .envThen edit only the fields relevant to your provider:
# Option 1: Local Ollama
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
LLM_MODEL_NAME=qwen2.5
# Option 2: DeepSeek
# LLM_PROVIDER=deepseek
# OPENAI_COMPAT_API_KEY=sk-...
# OPENAI_COMPAT_BASE_URL=https://api.deepseek.com
# OPENAI_COMPAT_MODEL=deepseek-chat
# Option 3: Claude
# LLM_PROVIDER=claude
# ANTHROPIC_API_KEY=sk-ant-...
# CLAUDE_MODEL_NAME=claude-haiku-4-5-20251001Good to know: This repo already includes
data/raw/train.jsonland a prebuiltvector_store/directory. On a fresh machine,scripts/setup.shwill also rebuild the FAISS index if needed.
Step 1: Install Prerequisites (Skip if already installed)
- Download and install Python 3.10+. [CRITICAL]: Ensure you check Add Python to PATH at the bottom of the installer!
- Download and install Node.js (LTS version).
- (Only if using local Ollama) Download and install Ollama for Windows.
Step 2: Download NeneBot
Click the green Code button on this GitHub page and select Download ZIP. Extract it to a folder on your PC (e.g., D:\NeneBot).
Step 2.5: Configure your provider
Copy .env.example to .env, then fill in the keys only if you are using a cloud provider.
Step 3: One-Click Ignition!
Open the extracted folder and double-click start_windows.bat.
- Grab a coffee. The script will automatically download dependencies, wake up the AI engine, and launch your browser.
- Once the UI pops up, Nene is ready to chat!
Open your terminal and execute the following elegant commands:
# 1. Clone the repository
git clone https://github.com/your-username/NeneBot.git
cd NeneBot
# 2. Create your local environment file
cp .env.example .env
# 3. Grant execution permissions to scripts
chmod +x scripts/setup.sh scripts/run.sh
# 4. Run the automated setup (Only required once)
./scripts/setup.sh
# 5. Ignite the engines!
./scripts/run.shTip: Once started, visit
http://localhost:5173in your browser for the UI. The backend API Swagger docs are located athttp://localhost:8000/docs.Important: Do not open
frontend/index.htmldirectly in the browser. The application requires a running FastAPI backend (/v1/*) and should be accessed through the Vite dev server (5173) or the compiled production build served by FastAPI.
For day-to-day usage, prefer the unified launcher instead of remembering multiple raw commands:
# Full local development mode: backend + frontend
python scripts/launch.py dev
# Backend only
python scripts/launch.py local
# Telegram polling bot
python scripts/launch.py telegramIf you already created a local virtual environment, use:
./venv/bin/python scripts/launch.py devIf you want to access the full app from one URL only instead of running Vite separately:
# 1. Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 2. Frontend build
cd frontend
npm install
npm run build
cd ..
# 3. Serve both API and frontend from FastAPI
python -m uvicorn src.main:app --host 0.0.0.0 --port 8000Then open:
http://localhost:8000→ Full applicationhttp://localhost:8000/admin→ Read-only admin consolehttp://localhost:8000/docs→ API docs
For a more production-like local stack with persistent session storage:
cp .env.example .env
docker compose -f deploy/docker_compose.yml up --buildThis stack starts:
apponhttp://localhost:8000redisonlocalhost:6379
Recommended .env settings for this mode:
SESSION_BACKEND=redis
REDIS_URL=redis://redis:6379/0Operational endpoints:
GET /health→ service health, vector index status, frontend mode, session backend statusGET /health/live/GET /health/ready→ split liveness/readiness probesGET /metrics→ Prometheus-style metrics text- Response header
X-Request-ID→ request correlation id for logs and API debugging
Optional API protection:
API_AUTH_ENABLED=true
API_AUTH_TOKENS=frontend|chat:dev-chat-token,ops|ops:dev-ops-token
API_AUTH_REGISTRY_PATH=./config/api_tokens.jsonWhen enabled, requests must include either:
Authorization: Bearer <token>X-API-Key: <token>
Scope rules:
/v1/*requireschat/health*and/metricsrequireops- Legacy tokens without explicit scopes still get
chat+opsfor backward compatibility
You can also move tokens into a local registry file instead of .env:
[
{ "name": "frontend", "token": "replace-with-chat-token", "scopes": ["chat"] },
{ "name": "ops-dashboard", "token": "replace-with-ops-token", "scopes": ["ops"] }
]See config/api_tokens.json.example for the full format.
Server logs also emit JSON audit fields such as action, endpoint, session_id, auth_subject, auth_scopes, and request_id.
Optional tracing:
TRACING_ENABLED=true
TRACING_SERVICE_NAME=nenebot
TRACING_EXPORTER=consoleCurrent spans:
http.requestrag.retrievellm.request
How to verify tracing works locally:
- Install updated dependencies:
pip install -r requirements.txt
- Enable tracing in
.env:TRACING_ENABLED=true TRACING_EXPORTER=console
- Start the API server and send one chat request.
- Check the server stdout. You should see span output containing names like:
http.requestrag.retrievellm.request
- Confirm span attributes include fields such as:
request_idhttp_pathprovider_namerag.retrieved_countauth.subjectwhen a token is used
If you see normal API responses and span dumps in the terminal, tracing is wired correctly.
Read-only admin MVP:
GET /admin/api/overview→ runtime, health, LLM, retrieval, auth, integrations summaryGET /admin/api/metrics/summary→ preview of rendered Prometheus metricsGET /admin/api/knowledge/overview→ dataset summary and vector store summaryPOST /admin/api/knowledge/import→ import JSONL dataset contentPOST /admin/api/knowledge/rebuild→ rebuild vector index from current datasethttp://localhost:8000/admin→ browser admin console
How to verify the admin console works:
- Configure an
opstoken in.envorconfig/api_tokens.json. - Build the frontend and start the app.
- Open
http://localhost:8000/admin. - When prompted, paste the
opstoken. - Confirm the page shows:
- service/environment/version
- health status
- LLM provider/model
- configured auth identities
- metrics preview lines
If the page loads and these cards populate, the admin MVP is working.
Knowledge base operations:
- The admin console now includes a knowledge panel for:
- viewing dataset preview
- validating JSONL before writing
- importing JSONL content
- rebuilding the vector index
- Imported content must be JSONL, one JSON object per line, with a
messageslist.
How to verify knowledge import and rebuild:
- Open
http://localhost:8000/adminwith anopstoken. - Paste one valid JSONL line into the knowledge textarea, for example:
{"messages":[{"role":"system","content":"You are Nene."},{"role":"user","content":"你好"},{"role":"assistant","content":"你好呀,保科君。"}]} - Click
VALIDATEfirst and confirm the dry-run succeeds. - Click
IMPORT + REBUILD. - Confirm the page updates:
- dataset line count changes
- preview shows the imported user/assistant pair
- vector store summary refreshes
- Send a normal chat request and confirm the service still answers normally.
If the dataset summary updates and rebuild completes without error, the knowledge workflow is connected correctly.
Release gate:
- See RELEASE_CHECKLIST.md before tagging a preview release.
The simplest IM integration path is Telegram via the official Bot API.
- Create a bot with
@BotFather - Copy the token into
.env - Start the polling adapter:
cp .env.example .env
# Fill in:
# TELEGRAM_BOT_TOKEN=123456:ABC...
# LLM_PROVIDER=deepseek # or ollama / claude / openai
python scripts/launch.py telegramNotes:
- Telegram messages are mapped to internal session ids like
telegram:<chat_id> - Built-in commands:
/start,/help,/reset,/model /resetclears that chat's memory window- Default mode is long polling, so you do not need a public webhook URL yet
To switch to webhook mode later:
TELEGRAM_MODE=webhook
TELEGRAM_PUBLIC_BASE_URL=https://your-domain.com
TELEGRAM_WEBHOOK_PATH=/integrations/telegram/webhook
TELEGRAM_WEBHOOK_SECRET=your-secretThen start the normal API server. Telegram will POST updates to:
POST /integrations/telegram/webhook
If TELEGRAM_WEBHOOK_SECRET is set, the server verifies the
X-Telegram-Bot-Api-Secret-Token header.
This project strictly adheres to microservice and front/back-end decoupling standards:
NeneBot/
├── 📂 data/ # Raw script corpus (for vectorization)
├── 📂 vector_store/ # FAISS persistent vector index
├── 📂 frontend/ # Vue 3 + Vite immersive UI
├── 📂 src/ # FastAPI core backend service
│ ├── api/ # Routing and Pydantic data validation
│ ├── core/ # pydantic-settings config and global exceptions
│ ├── infrastructure/ # External adapters (FAISS, Ollama, Claude, DeepSeek)
│ └── services/ # Core business logic (RAG pipeline, Embeddings, Sessions)
├── 📂 scripts/ # DevOps toolbox (Setup, Run, Linters)
├── 📄 railway.toml # One-click Railway deployment config
├── 📄 .env.example # Environment variable template
├── 📄 pyproject.toml # Industrial linter configs (Ruff & Mypy)
└── 📄 requirements.txt # Python dependency list
For developers who want to tweak the bot, you can easily customize Nene:
- Switch LLM Provider: Set
LLM_PROVIDERin.envtoollama,claude,deepseek, oropenai. See.env.examplefor the full list of required keys. - Adjust Strictness: Modify
MATCH_THRESHOLD(default0.55) in.envor directly insrc/services/rag_pipeline.py. Lower values make her stick strictly to the script; higher values allow more creative freedom. - Change Sprites & Backgrounds: Replace
nene_sprite.pngandbg_room.pngin thefrontend/public/directory. Changes apply instantly in dev thanks to Vite HMR. - Modify Character Persona: Edit the
_CHARACTER_CARDconstant insrc/services/rag_pipeline.pyto add new personality traits or instructions. - Rebuild the Memory Index: If you replace
data/raw/train.jsonl, runpython scripts/init_vector_db.pyto regeneratevector_store/. - Session Persistence: Set
SESSION_BACKEND=redisand configureREDIS_URLto persist chat memory across restarts. - Operations: Use
/healthfor diagnostics andX-Request-IDto correlate client failures with server logs. - Unified Launcher: Use
python scripts/launch.py dev,local, ortelegramdepending on the runtime mode you want. - Telegram Adapter: Set
TELEGRAM_BOT_TOKENand runpython scripts/launch.py telegramto attach the bot to Telegram via long polling.
1. "Python / Node is not recognized as an internal or external command" on Windows?
You either haven't installed Python/Node.js, or forgot to add them to your environment variables. Reinstall them and ensure you check the "Add to PATH" option.
2. The chat shows a connection error or "Nene's thoughts disconnected"?
First verify that the backend is actually running:
- Frontend dev mode: open
http://localhost:5173 - Backend health check: open
http://localhost:8000/health - API docs: open
http://localhost:8000/docs
If using Ollama: The Ollama service may not be running, or your machine ran out of VRAM/RAM. Try running ollama run qwen2.5 manually. On Linux/WSL, also ensure no system proxy is intercepting localhost traffic (unset http_proxy).
If using a cloud API: Verify that your ANTHROPIC_API_KEY or OPENAI_COMPAT_API_KEY is set correctly in .env and that LLM_PROVIDER matches.
3. Can I open the page directly without starting anything?
No. This is not a static HTML demo.
The Vue page depends on the FastAPI backend for /v1/chat/stream, session memory, and RAG retrieval. Use one of these two modes instead:
- Development mode:
./scripts/run.shthen openhttp://localhost:5173 - Single-port mode: build
frontend/dist, start FastAPI, then openhttp://localhost:8000
4. Can I swap the character to someone else (e.g., Ayase Mitsukasa)?
Absolutely! This is a universal architecture. Simply:
- Replace
data/raw/train.jsonlwith Ayase's dialogue data. - Run
python scripts/init_vector_db.pyto rebuild the memory database. - Replace the sprite assets in
frontend/public/. - Update the character name and persona in the system prompt.
We welcome contributions from the community! Whether it's fixing bugs, improving the CSS aesthetics, or providing better script datasets, please follow these steps:
Forkthe Project.- Create your Feature Branch:
git checkout -b feature/AmazingFeature - Commit your Changes:
git commit -m 'feat: Add some AmazingFeature' - Push to the Branch:
git push origin feature/AmazingFeature - Open a Pull Request.
(Note: Before submitting PRs, please run ./scripts/run_linter.sh to ensure your code passes our strict Ruff and Mypy checks.)
Distributed under the GNU General Public License v3.0. This project is for technical exploration and learning purposes only. The copyright of the character sprites, background art, and game scripts belongs to the original creator (Yuzusoft). Please do not use them for commercial purposes.