A lightweight PDF Question Answering app built using DeepSeek LLM and LangChain with Streamlit frontend.
This project allows users to upload a PDF document and ask natural language questions about its content. The app processes the PDF, splits the content into chunks, embeds them into a vector database using FAISS and DeepSeek embeddings, and retrieves the most relevant chunks to generate answers using DeepSeek LLM.
- Upload any PDF
- Chunk and embed content using LangChain
- Semantic search using FAISS
- LLM-based question answering using DeepSeek
- Streamlit-based intuitive UI
RAG_WITH_DEEPSEEK/
│
├── deepseek/
│ ├── vector_databases.py # Loads, chunks, embeds PDF content
│ ├── Rag_pipeline.py # Retrieval + QA logic
│ ├── frontend.py # Streamlit app
│ └── requirements.txt
│
├── pdfs/ # PDF upload folder
├── vectorstore/ # FAISS vector database folder
├── data.pdf # Sample PDF
├── .env # Environment variables
└── README.md
- Upload a PDF using the Streamlit UI.
- PDF is parsed using
PDFPlumberLoader. - Content is chunked via
RecursiveCharacterTextSplitter. - Chunks are embedded using
DeepSeekembeddings viaOllamaEmbeddings. - Chunks are stored in a FAISS vector store.
- On a user query, relevant chunks are retrieved and passed to DeepSeek LLM for answering.
git clone https://github.com/your-username/pdf-rag-deepseek.git
cd pdf-rag-deepseek
python -m venv deepseek
source deepseek/bin/activate # or .\deepseek\Scripts\activate on Windows
pip install -r requirements.txtstreamlit run frontend.pyCheck requirements.txt for full list.
RAG, LangChain, DeepSeek, Streamlit, Vector Database, LLM, PDF, FAISS
MIT License