Back to Projects

Overview

TINA is a lightweight computer vision module designed to classify scanned PDF documents and detect Energy Performance Certificates (EPCs) by country (Germany, France, Austria). Unlike traditional OCR-based approaches, it relies entirely on visual structure recognition, making it robust to noisy scans and multilingual content.

Approach

The module processes PDFs as images, extracting layout-based features using a pretrained CNN. It then matches each document to known EPC templates through visual similarity and country-specific validation. The output is a structured JSON summary used for automated routing within backend systems.

Key Highlights

  • Visual-based classification replacing fragile OCR/text-based pipelines.
  • Lightweight CNN feature extraction for scalable, high-throughput inference.
  • Country-specific validation through handcrafted visual cues.
  • Cross-platform deployment with API integration.

My Role

  • Designed and implemented the full visual classification pipeline.
  • Integrated pretrained CNNs for feature extraction and similarity scoring.
  • Developed modular country-level detectors using OpenCV and scikit-learn.
  • Engineered API integration and optimized for production deployment.

Impact

  • Significantly improved classification accuracy and robustness.
  • Reduced false positives in noisy and multilingual EPC documents.
  • Delivered a modular, production-ready component adaptable to new formats.

Learnings

  • Applied computer vision for document layout analysis.
  • Balanced model performance with deployment efficiency.
  • Strengthened practical skills in Python, TensorFlow, and OpenCV.