Note: This repo contains only deployment/demo files.
For full source, notebooks, and complete code, see Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI.
Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI
This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation.
The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API.
Main Repo
This is a minimal clone with only the necessary files from the main repo.
For full source, notebooks, and complete code, see Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI.
Project Overview
End-to-end defect detection and localization using the Capsule class from the MVTec AD dataset.
Key steps include:
- Data preprocessing, formatting, and augmentation
- Model design (pre-trained backbone + custom heads)
- Training, evaluation, and hyperparameter tuning
- Dockerized FastAPI deployment for inference
Portfolio project to showcase ML workflow and engineering.
Key Results
- Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split
- Quantitative results on test evaluation:
- Classification accuracy: 83 %
- Classification defect-only accuracy: 75 %
- Defect presence accuracy: 91 %
- Segmentation quality (mIoU / Dice): 0.79 / 0.73
- Segmentation defect-only quality (mIoU / Dice): 0.70 / 0.55
- Model artifacts:
- Original model size (.keras / SavedModel): 345 MB
- Raw Converted TFLite size (.tflite): 119 MB
- Optimized Converted TFLite size (.tflite): 31 MB (Dynamic Range Quantization applied)
- Container / runtime:
- Docker image size: 317 MB
- Runtime used: tflite-runtime + Uvicorn/FastAPI
- Avg inference latency (inference only, set tensor + invoke): 239 ms
- Avg inference latency (single POST request, measured): 271 ms
- Average memory usage during inference: 321 MB
- Startup time (local): 72 ms
- Observations:
- The app returns expected visualizations and class labels for the MVTec-style test images.
- POST inference latency measured locally, expect increased latency on real use (network delays)
- Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement.
Dataset
- Capsule class from MVTec AD dataset
- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Dataset folder contains license file
- Usage is strictly non-commercial/educational
Tech Stack
- Python
- TensorFlow
- Scikit-Learn
- Numpy / Pandas
- OpenCV / Pillow
- Ray Tune (Experiment tracking)
- OmegaConf (Config management)
- Docker, FastAPI, Uvicorn (Deployment)
Folder Structure
data/ # Dataset and annotations
app/ # Inference and deployment code and files
models/ # Saved trained models and training logs
How to Run
Build image for deployment:
- Requirements:
models/final_model/final_model.tflite(included)app/folder and contents (included)Dockerfile(included).dockerignore(included)
- From the project root, build and run the Docker image:
docker build -t cv-app .
docker run -p 8000:8000 cv-app
- Open http://0.0.0.0:8000 in your browser to access the demo UI
Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)
Contact
For questions reach out via GitHub (Kev-HL).