TINA — TypeDeterminator Identifier & Nomenclature Assistant

Overview

TINA is a lightweight computer vision module designed to classify scanned PDF documents and detect Energy Performance Certificates (EPCs) by country (Germany, France, Austria). Unlike traditional OCR-based approaches, it relies entirely on visual structure recognition, making it robust to noisy scans and multilingual content.

Approach

The module processes PDFs as images, extracting layout-based features using a pretrained CNN. It then matches each document to known EPC templates through visual similarity and country-specific validation. The output is a structured JSON summary used for automated routing within backend systems.

Key Highlights

Visual-based classification replacing fragile OCR/text-based pipelines.
Lightweight CNN feature extraction for scalable, high-throughput inference.
Country-specific validation through handcrafted visual cues.
Cross-platform deployment with API integration.

My Role

Designed and implemented the full visual classification pipeline.
Integrated pretrained CNNs for feature extraction and similarity scoring.
Developed modular country-level detectors using OpenCV and scikit-learn.
Engineered API integration and optimized for production deployment.

Impact

Significantly improved classification accuracy and robustness.
Reduced false positives in noisy and multilingual EPC documents.
Delivered a modular, production-ready component adaptable to new formats.

Learnings

Applied computer vision for document layout analysis.
Balanced model performance with deployment efficiency.
Strengthened practical skills in Python, TensorFlow, and OpenCV.