Skip to content

CulturalProfessor/invoice-parsing-app

Repository files navigation

πŸ“„ Invoice Parsing App

🌟 Overview

The Invoice Parsing App is a mobile application designed to enhance the efficiency of invoice scanning and data extraction. Utilizing lightweight, on-device machine learning models, the app provides a robust offline tool for parsing and managing invoices. This ensures enhanced privacy and rapid inference without reliance on server infrastructure.

✨ Features

  • πŸ” OCR-Based Scanning: Extracts key information such as supplier details, itemized purchases, and taxes.
  • πŸ€– Hybrid Model Integration:
    • πŸ› οΈ ML Kit's Entity Extractor: Identifies general fields like dates, addresses, and emails.
    • πŸ“Š Fine-Tuned BiLSTM NER Model: Extracts domain-specific entities such as invoice numbers and supplier names.
    • πŸ“ Proximity-Based Detection: Recognizes entities like GST numbers and invoice IDs using spatial relationships.
  • ✏️ Editable User Interface: Users can verify and manually correct extracted data.
  • πŸ’Ύ Local Storage: Parsed data is stored offline to enhance privacy and accessibility.

βš™οΈ Technical Highlights

πŸ—‚οΈ Data Preparation

  • A dataset of 1,000 Tally invoices was collected and annotated.
  • Regex patterns were employed to label entities such as dates, amounts, and invoice numbers.

πŸ‹οΈβ€β™‚οΈ Model Training

  • A pre-trained BiLSTM NER model was fine-tuned for improved performance.
  • The model was optimized for on-device usage using TensorFlow Lite (TFLite).

πŸ”— Integration

  • Results from ML Kit and BiLSTM were combined into a unified system.
  • Supported local Datastore DB for structured offline data management.

⚠️ Limitations

  • πŸŒ€ Complex Layouts: The app struggles with unconventional invoice layouts.
  • πŸ“„ Preprocessing Dependency: Requires high-quality scans for accurate extraction.
  • πŸ€·β€β™‚οΈ Model Conflict Resolution: Manual verification is necessary for conflicting outputs.
  • πŸ“Š Dataset Limitations: Performance is tied to the quality of the training dataset.

πŸš€ Future Scope

  • 🌐 Multi-Language Support: Plans to expand capabilities to handle invoices in multiple languages.
  • πŸ“± Cross-Platform Compatibility: Extend support to iOS for broader accessibility.
  • πŸ“š Batch Processing: Enable simultaneous processing of multiple documents.
  • πŸ“‘ Additional Document Types: Incorporate support for PDFs, purchase orders, and contracts.

πŸ› οΈ Getting Started

πŸ”§ Prerequisites

  • πŸ–₯️ Android Studio (for development and testing)
  • 🐍 Python (for dataset preparation and model training)
  • πŸ€– TensorFlow Lite (for on-device model optimization)

πŸ“₯ Installation

You can download the APK file for the Invoice Parsing App using the link below:

πŸ“₯ Download APK

Test app on this invoice: PerfectVisionInvoice_2024-07-08_18-33-16_45.pdf

Watch a demo of the app in action:

▢️ Demo Video

To get started with the Invoice Parsing App, clone the repository using the following command:

git clone https://github.com/CulturalProfessor/invoice-parsing-app.git

About

πŸ“„ Invoice Parsing App

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •