Back to Projects

Problem

Most developers interact with Large Language Models (LLMs) as black boxes — fine-tuning, prompting, or deploying pre-trained systems like GPT-4 or LLaMA without understanding their inner mechanics.
I wanted to demystify these systems by implementing a miniature GPT architecture entirely from scratch, covering every step from tokenization to serving, while keeping it light enough to train on a local GPU (2GB VRAM) or CPU.
The goal was to build an educational yet functional mini-LLM pipeline that mirrors real-world systems — including data processing, model training, inference APIs, and lightweight MLOps tracking.

Solution Overview

The project implements a simplified GPT-style language model that learns next-token prediction on small text corpora (e.g., Tiny Shakespeare).
It provides an end-to-end training and serving workflow:

  1. Upload a raw text corpus
  2. Train a custom BPE tokenizer
  3. Encode the corpus into token IDs
  4. Train a mini GPT model using PyTorch
  5. Serve the model through a FastAPI backend
  6. Interact with the system via a clean HTML UI
  7. Track experiments and checkpoints with DVC & MLflow

This structure reflects a complete production-like ML system, scaled down for accessibility and clarity.

Architecture & Design

         ┌──────────────────────────────────────┐
         │            UI Layer                  │
         │   (HTML/JS) — Upload | Train | Test   │
         └──────────────┬───────────────────────┘


┌────────────────────────────────────────────────────────┐
│                  FastAPI Backend                       │
│ /upload → /build_tokenizer → /train → /generate        │
│ Handles orchestration, caching, and model loading      │
└────────────────────────────────────────────────────────┘


┌────────────────────────────────────────────────────────┐
│                     Core Logic                         │
│ ├── tokenizer/bpe_tokenizer.py                         │
│ ├── gpt/model.py ← Transformer decoder                 │
│ ├── gpt/train.py ← Training loop (AdamW + scheduler)   │
│ ├── gpt/generate.py ← Sampling (top-k, nucleus, temp)  │
│ └── data/encode.py ← Prepare dataset batches           │
└────────────────────────────────────────────────────────┘


┌────────────────────────────────────────────────────────┐
│                    MLOps Layer                         │
│ - DVC pipeline (tokenize → encode → train)             │
│ - MLflow experiment logging (loss, perplexity)         │
│ - Dockerfile for serving container                     │
└────────────────────────────────────────────────────────┘

Technical Highlights

Byte-Pair Encoding (BPE) Tokenizer

  • Implemented from scratch with vocabulary learning and merge rules.
  • Trains on any text file and exports vocab.json / merges.txt.

Decoder-only Transformer

  • Multi-head self-attention, learned positional embeddings, LayerNorm, GELU activations.
  • Supports variable block sizes for efficient mini-batch training.

Training Pipeline

  • AdamW optimizer with weight decay.
  • Warmup and cosine learning rate scheduler.
  • Checkpoint saving every N steps.

Text Generation

  • Configurable temperature, top-k, and top-p (nucleus) sampling.
  • Deterministic seeding for reproducible outputs.

MLOps Integration

  • DVC to version and reproduce the full pipeline (tokenize → encode → train).
  • MLflow to log hyperparameters, losses, and perplexity metrics.
  • Docker container for running the FastAPI inference server.

Visual UI

  • Single-page web app (HTML/CSS/JS) served by FastAPI.
  • Upload corpus → Build vocab → Train model → Generate text interactively.

My Contributions

This was a solo project, and I personally implemented and integrated every component:

  • Designed the full PyTorch GPT architecture from scratch (no Hugging Face dependency).
  • Built the BPE tokenizer and text encoding pipeline.
  • Implemented training loop, loss tracking, and sampling algorithms.
  • Developed a FastAPI backend with modular endpoints for each phase.
  • Created a custom web UI for visualizing the workflow interactively.
  • Added DVC and MLflow for experiment tracking and reproducibility.
  • Wrote pytest tests and structured the project for scalability and clarity.

Results & Impact

  • Trained a working mini GPT model on Tiny Shakespeare that can generate coherent text.
  • Achieved smooth performance on CPU and low-VRAM GPU (2GB) setups.
  • Delivered a self-contained educational LLM system, bridging deep learning and MLOps concepts.
  • Used the project as a teaching and portfolio artifact to demonstrate end-to-end ML system design.

Example output:
Prompt: "Once upon a time"
Model: "Once upon a time there was a noble prince, and the wind whispered soft verses of love..."

What I Learned

  • Deepened my understanding of how tokenization, attention, and sampling interact in GPT architectures.
  • Learned how to structure ML pipelines reproducibly using DVC and MLflow.
  • Improved skills in serving ML models efficiently through FastAPI.
  • Gained appreciation for system thinking — connecting modeling, engineering, and user interaction.