Image-Based Data Extraction System

Developed a Raspberry Pi 4-based system to automate inventory management by capturing product label images, extracting text using PaddleOCR, and categorizing entity-based information with RegEx. The extracted data is stored in MongoDB, allowing structured storage and efficient querying.

Overview

Ideal for industries requiring streamlined inventory management, this system uses Raspberry Pi and a camera module to capture product label images, perform OCR-based data extraction, and store the structured information in MongoDB.

Features

Automated Image Capture: Uses a camera module to capture product label images.
OCR Text Extraction: Extracts text from images via PaddleOCR.
Entity Parsing: Identifies key product attributes (weight, volume, voltage, etc.) using RegEx.
Data Storage: Stores structured data in MongoDB for easy access and analysis.

Practical Use Cases

Manufacturing Plants: Tracks materials and products in real-time.
Warehouses: Automates inventory logging, minimizing manual errors.
Retail: Manages product details and availability, streamlining restocking.

System Requirements

Raspberry Pi 4 (or compatible)
Raspberry Pi Camera Module
32 GB MicroSD Card (minimum recommended)
Python 3 (pre-installed on Raspberry Pi OS)
MongoDB Atlas or local MongoDB installation for data storage
Internet Connection

Required Libraries

Install these Python libraries on your Raspberry Pi environment:

pip3 install paddleocr pymongo pillow flask

paddleocr: For text extraction from images.
pymongo: For MongoDB interaction.
pillow: For image processing.
flask: To create a REST API for image uploads.

Setup and Installation

Raspberry Pi Setup

Set up Raspberry Pi: Complete initial setup (configure language, time zone, etc.).
Enable Camera:
- Open Raspberry Pi Configuration from the main menu.
- In the Interfaces tab, enable Camera and SSH (for remote access).
Update System:
```
sudo apt update && sudo apt upgrade -y
```
Install Python and pip:
```
sudo apt install python3 python3-pip -y
```

MongoDB Setup

MongoDB Atlas: Create an account and set up a new cluster.
- Obtain the MongoDB connection URI.
- Whitelist your IP for database access.

Running the Application

Clone the Project Repository:

git clone https://github.com/Dhyey122403/Image-Based-Data-Extraction-System
cd Image-Based-Data-Extraction-System

Download the PaddleOCR model (for OCR):
```
paddleocr --lang en
```
Edit MongoDB Connection:
- Open app.py and replace mongodb_uri with your MongoDB URI.
Run the Application:
```
python3 app.py
```

File Upload and Testing

To test image upload and data extraction:

Use a tool like Postman or CURL to send an image to your Flask endpoint (/upload).

Example CURL command:

curl -X POST -F "file=@/path/to/image.jpg" http://<raspberry-pi-ip>:5000/upload

Project Structure

Here’s the general structure for this project:

Image-Based-Data-Extraction-System/
├── app.py                # Main application script
├── ocr_module.py         # Module for OCR extraction using PaddleOCR
├── mongo_module.py       # MongoDB interactions (CRUD operations)
├── requirements.txt      # List of required Python libraries
├── README.md             # Project documentation
├── uploads/              # Directory to store uploaded images temporarily
└── templates/            # Flask templates (if needed)

Currently Flask is under progress , you can verify other files by running main.py

Future Enhancements

Extend Entity Extraction: Add patterns to capture additional product attributes.
Enhance OCR Accuracy: Experiment with additional image pre-processing.
Dashboard Interface: Build a dashboard for real-time inventory tracking and analytics.

Contact

For questions or support, please reach out at dhyeysavaliya.dks@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
main.py		main.py
mongo_integration.py		mongo_integration.py
ocr_module.py		ocr_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Based Data Extraction System

Overview

Table of Contents

Features

Practical Use Cases

System Requirements

Required Libraries

Setup and Installation

Raspberry Pi Setup

MongoDB Setup

Running the Application

File Upload and Testing

Project Structure

Future Enhancements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image-Based Data Extraction System

Overview

Table of Contents

Features

Practical Use Cases

System Requirements

Required Libraries

Setup and Installation

Raspberry Pi Setup

MongoDB Setup

Running the Application

File Upload and Testing

Project Structure

Future Enhancements

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages