FinRAG — Multimodal Document QA

title	FinRAG API
emoji	⚡
colorFrom	blue
colorTo	indigo
sdk	docker
app_port	7860

FinRAG — Multimodal Document QA

An intelligent document question-answering system that combines vision-language models, hybrid retrieval (BM25 + semantic search), and dynamic table extraction to answer questions about uploaded PDF documents with high accuracy.

Upload any annual report, financial filing, or document PDF and ask natural language questions — the system extracts precise values from tables, elaborates answers using LLMs, handles typos, supports follow-up conversations, and detects unanswerable queries.

Key Features

Dynamic Table Extraction — Automatically discovers table structure (row labels, fiscal years, column headers) from any document at upload time. No hardcoded mappings or company-specific logic.
LLM-Elaborated Answers — Simple table lookups return instant values; complex answers (trends, comparisons, qualitative) are elaborated by the VLM into rich, document-quality responses with bold formatting and structured bullet points.
Hybrid Retrieval Pipeline — BM25 keyword search (SQLite FTS5) + dense semantic search (FAISS) merged via Reciprocal Rank Fusion, then reranked with a cross-encoder for precision.
Adjacent Chunk Expansion — After retrieving top-k chunks, automatically includes all chunks from the same pages for richer context.
Multi-Model Architecture — Extractive QA (RoBERTa-SQuAD2) for factual questions, generative VLM (Qwen2-VL-2B) for complex/conversational queries, with automatic routing.
Conversational Memory — Follow-up questions with context-aware history rewriting. Ask "What was the revenue?" then "How does it compare to last year?" and get coherent answers.
Smart Follow-Up Suggestions — Rule-based generation of 3 contextual follow-up questions after each answer (zero latency, no extra LLM call).
Typo Tolerance — Fuzzy synonym matching with context-aware overlap scoring handles misspelled queries (e.g., "reveneu" → "revenue", "net incme" → "net income").
Unanswerable Detection — Identifies questions about future predictions, stock prices, and out-of-scope data rather than hallucinating answers.
Document Overview — Built at load time with key financial metrics, passed to all LLM calls for grounded answers.
Markdown Answer Rendering — Frontend renders bold, bullet lists, and line breaks from VLM output for readable responses.
Document Page Viewer — View rendered PDF pages alongside answers with source page navigation.
React Frontend — Modern chat-style interface with drag-and-drop PDF upload, animated loading stages, and follow-up chips.

Architecture

PDF Upload → Text/Table Extraction (PyMuPDF + pdfplumber)
          → Chunking (text + table-aware)
          → Dual Indexing: SQLite FTS5 (BM25) + FAISS (semantic)
          → DocumentIndex (discovers labels, years, structure)
          → Document Overview (key metrics for LLM context)

Question → Typo Correction (fuzzy synonym expansion)
        → Unanswerable Check
        → Conversational Routing (summaries, overviews, risks)
        → Table Extraction (dynamic label matching, full-document scan)
           ├─ Simple value ($X,XXX) → instant return (no LLM)
           └─ Complex answer → VLM elaboration
        → Extractive QA (RoBERTa-SQuAD2) → VLM elaboration
        → Generative VLM (Qwen2-VL-2B with page images)
        → Confidence-ranked answer selection
        → Follow-up suggestion generation

Models Used

Component	Model	Purpose
Vision-Language	Qwen2-VL-2B-Instruct	Multimodal understanding, answer elaboration, conversational QA
Extractive QA	deepset/roberta-base-squad2	Precise span extraction from text
Embeddings	all-MiniLM-L6-v2	Semantic search embeddings
Reranker	ms-marco-MiniLM-L-6-v2	Cross-encoder reranking for retrieval precision

Tech Stack

Backend: Python, FastAPI, PyTorch, Transformers, FAISS, SQLite FTS5, PyMuPDF, pdfplumber

Frontend: React, Vite, React Icons

Retrieval: Hybrid BM25 + dense retrieval, Reciprocal Rank Fusion, cross-encoder reranking

Getting Started

Prerequisites

Python 3.10+
Node.js 18+
~4 GB disk space for model weights (downloaded automatically on first run)

Installation

# Clone the repository
git clone https://github.com/codeC02003/Financial-report-RAG.git
cd Financial-report-RAG

# Install Python dependencies
pip install -r requirements.txt

# Install frontend dependencies
cd frontend
npm install
cd ..

Running the Application

Start the backend:

python run_backend.py

The API server starts at http://localhost:8000. Models are downloaded automatically on first launch.

Start the frontend (in a separate terminal):

cd frontend
npm run dev

The UI opens at http://localhost:5173.

Usage

Open the frontend in your browser
Upload a PDF document (annual reports, 10-K filings, financial documents, etc.)
Ask questions in natural language:
- "What was the total revenue?" → Instant table lookup
- "How did net income change from 2023 to 2024?" → Comparison with elaboration
- "How did revenue change over the years?" → Multi-year trend analysis
- "What percentage of revenue came from services?" → Calculated percentage
- "Summarize the main risks." → VLM-powered qualitative answer
- "What was total reveneu?" → Typo handled correctly
- "What will revenue be in 2030?" → Correctly identified as unanswerable
Click follow-up suggestion chips to continue exploring

Evaluation Results

Tested across multiple documents covering financial data extraction, comparisons, trends, conversational QA, follow-up questions, typo handling, and unanswerable detection.

Document	Questions	Accuracy
Tandy Leather 10-K (2024)	27	100% (27/27)
Apple 10-K (2024)	50	94% (47/50)
Aaron's Holdings 10-K (2023)	14	100% (14/14)
Materion Corp 10-K (2024)	14	100% (14/14)

Question categories tested:

Direct value lookup (revenue, net income, total assets)
Year-over-year comparisons and differences
Percentage calculations
Trend analysis across multiple years
Conversational questions (summaries, overviews, risk analysis)
Follow-up questions with conversational context
Typo handling (misspelled financial terms)
Unanswerable detection (future predictions, stock prices)
Alternate phrasings and informal queries

Project Structure

Financial-report-RAG/
├── src/
│   ├── api/server.py              # FastAPI REST endpoints
│   ├── qa_engine/engine.py        # Core QA pipeline, table extraction, follow-ups
│   ├── models/
│   │   ├── vision_model.py        # Qwen2-VL wrapper (answer + elaborate)
│   │   └── extractive_qa.py       # RoBERTa-SQuAD2 wrapper
│   ├── retrieval/embedder.py      # Hybrid BM25 + FAISS + reranker
│   └── pdf_processing/
│       ├── extractor.py           # PDF text & image extraction
│       └── chunker.py             # Document chunking
├── frontend/src/                  # React chat interface
├── configs/config.yaml            # Model & pipeline configuration
├── run_backend.py                 # Backend entry point
├── run_tests.py                   # Multi-document benchmark
├── test_generic.py                # Generic benchmark for any document
├── test_comprehensive.py          # Comprehensive QA test suite (27 questions)
├── Dockerfile                     # Docker config for HF Spaces deployment
└── requirements.txt

API Reference

`POST /api/upload`

Upload a PDF document for processing.

curl -X POST http://localhost:8000/api/upload \
  -F "file=@document.pdf"

`POST /api/ask`

Ask a question about the uploaded document.

curl -X POST http://localhost:8000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What was the total revenue?", "top_k": 5}'

Response includes:

answer — The answer text (with markdown formatting)
confidence — Confidence score (0-1)
method — Which pipeline stage answered (table, extractive, vision, conversational, unanswerable)
source_pages — Page numbers where evidence was found
follow_ups — 3 suggested follow-up questions
is_unanswerable — Whether the question is out of scope

With conversation history (follow-ups):

curl -X POST http://localhost:8000/api/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How does it compare to last year?",
    "top_k": 5,
    "history": [
      {"role": "user", "content": "What was the total revenue?"},
      {"role": "assistant", "content": "$391,035"}
    ]
  }'

`GET /api/page/{page_number}`

Get a rendered page image for document viewing.

`GET /api/document`

Get document summary and metadata.

`GET /api/health`

Health check endpoint.

Configuration

Edit configs/config.yaml to customize:

pdf_processing:
  dpi: 120
  max_pages: 150
  chunk_size: 200

model:
  vision_model: "Qwen/Qwen2-VL-2B-Instruct"
  embedding_model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"

retrieval:
  top_k: 8
  similarity_threshold: 0.1

How It Works

PDF Processing — Extracts text and tables using PyMuPDF and pdfplumber. Tables are detected and chunked separately with headers preserved for context.
Dynamic Indexing — At upload time, all chunks are scanned to discover table row labels and fiscal year columns. A DocumentIndex enables precise table lookups, and a document overview with key metrics is generated for LLM context.
Hybrid Retrieval — Questions trigger parallel BM25 (keyword) and FAISS (semantic) searches. Results are merged via Reciprocal Rank Fusion and reranked with a cross-encoder. Adjacent chunks from the same pages are automatically included for richer context.
Multi-Stage QA — The engine tries strategies in order of precision:
- Table extraction scans all document chunks (not just retrieved ones) for precise numeric lookups. Simple values ($X,XXX) are returned instantly; complex answers are elaborated by the VLM.
- Extractive QA (RoBERTa-SQuAD2) extracts factual spans, then the VLM elaborates them.
- Generative VLM (Qwen2-VL-2B) handles conversational and complex queries with page images and document overview context.
Smart Follow-ups — Follow-up questions like "What about services?" or "Compare it with 2023" are enriched with context from conversation history. After each answer, 3 contextual follow-up suggestions are generated using rule-based logic (zero latency).
Typo Handling — Fuzzy synonym matching expands misspelled terms to known table labels using sequence matching with context-aware overlap scoring, so "net incme" correctly maps to "Net income" rather than "Operating income".

Deployment

Docker (Hugging Face Spaces):

docker build -t finrag .
docker run -p 7860:7860 finrag

Frontend (Vercel):

The React frontend can be deployed as a static site on Vercel. Update the API base URL in the frontend config to point to your backend.

Author

Chinmay Mhatre

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinRAG — Multimodal Document QA

Key Features

Architecture

Models Used

Tech Stack

Getting Started

Prerequisites

Installation

Running the Application

Usage

Evaluation Results

Project Structure

API Reference

`POST /api/upload`

`POST /api/ask`

`GET /api/page/{page_number}`

`GET /api/document`

`GET /api/health`

Configuration

How It Works

Deployment

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
frontend		frontend
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
run_backend.py		run_backend.py
run_tests.py		run_tests.py
test_comprehensive.py		test_comprehensive.py
test_generic.py		test_generic.py
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

FinRAG — Multimodal Document QA

Key Features

Architecture

Models Used

Tech Stack

Getting Started

Prerequisites

Installation

Running the Application

Usage

Evaluation Results

Project Structure

API Reference

POST /api/upload

POST /api/ask

GET /api/page/{page_number}

GET /api/document

GET /api/health

Configuration

How It Works

Deployment

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/upload`

`POST /api/ask`

`GET /api/page/{page_number}`

`GET /api/document`

`GET /api/health`

Packages