Industrial Document Classifier - CLIP ViT-B/16 with LoRA
This model classifies industrial and technical documents into 6 primary categories using a CLIP vision model fine-tuned with LoRA (Low-Rank Adaptation). Designed for manufacturing, e-commerce, and industrial applications where automated document organization is essential.
Model Description
- Base Model: openai/clip-vit-base-patch16
- Architecture: CLIP Vision Transformer (ViT-B/16)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank (r): 32
- LoRA Alpha: 64
- Task: Multi-class Document Image Classification
- Training Dataset: ~180,000 industrial document images
- Base Parameters: ~63.1M (CLIP ViT-B/16 vision tower)
- Trainable LoRA Parameters: ~1.2M (1.9% of base model)
- Total Parameters: ~64.3M
Document Categories
The model classifies documents into 6 parent categories designed for industrial and e-commerce environments:
1. Product Information (product_information)
Description: Documents that provide detailed specifications, features, and technical data about products. These are essential for product listings, sales, and customer understanding.
Includes:
- Product catalogs with item listings and descriptions
- Specification sheets with technical parameters (dimensions, materials, performance metrics)
- Technical bulletins announcing product updates or technical advisories
Use Cases:
- E-commerce platforms organizing product documentation
- Sales teams accessing product specifications
- Marketing departments creating product literature
- Customer support referencing product details
Examples:
- HVAC equipment specification sheets
- Electronic component catalogs
- Industrial machinery product brochures
- Chemical product technical data sheets
2. Engineering Drawings (engineering_drawings)
Description: Technical drawings and schematics that provide visual representations of product design, dimensions, and assembly details. Critical for manufacturing, installation, and maintenance.
Includes:
- Full engineering drawings with detailed dimensions and tolerances
- Line drawings showing simplified product views
- CAD-generated technical illustrations
- Assembly diagrams and exploded views
Use Cases:
- Manufacturing teams producing components
- Installation contractors planning installations
- Quality assurance verifying product specifications
- Maintenance teams identifying parts and assemblies
Examples:
- Mechanical part blueprints with GD&T annotations
- Electrical circuit schematics
- Plumbing fixture installation diagrams
- Structural component drawings
3. Instructional Guides (instructional_guides)
Description: Step-by-step documentation that guides users through installation, operation, maintenance, or repair procedures. Essential for safe and effective product use.
Includes:
- Installation and instruction manuals for setup procedures
- Owner's and user manuals for operation guidance
- Service manuals for repair and maintenance procedures
- Quick start guides and troubleshooting documentation
Use Cases:
- End users learning to operate equipment
- Installation professionals following setup procedures
- Service technicians performing repairs
- Training departments educating staff
Examples:
- Appliance installation guides
- Software user manuals
- Equipment operation handbooks
- Maintenance procedure documents
4. Compliance Certificates (compliance_certificates)
Description: Official documentation proving products meet regulatory standards, safety requirements, and material specifications. Critical for legal compliance and quality assurance.
Includes:
- Material Test Reports (MTR) certifying material composition and properties
- Safety Data Sheets (SDS) detailing chemical hazards and handling
- RoHS (Restriction of Hazardous Substances) compliance certificates
- Quality certifications and test reports
Use Cases:
- Procurement teams verifying supplier compliance
- Quality control departments validating materials
- Regulatory affairs ensuring legal compliance
- Environmental health and safety managing hazardous materials
Examples:
- Steel MTR certificates with chemical composition
- Chemical SDS for workplace safety
- RoHS compliance declarations for electronics
- ISO certification documents
5. Energy Ratings (energy_ratings)
Description: Documentation related to energy efficiency ratings, consumption data, and environmental performance certifications. Important for sustainability and regulatory compliance.
Includes:
- Energy Star certification guides and labels
- Energy efficiency ratings and performance data
- Environmental impact assessments
- Carbon footprint documentation
Use Cases:
- Purchasing departments selecting energy-efficient equipment
- Sustainability teams tracking environmental impact
- Facility managers optimizing energy consumption
- Regulatory compliance for energy standards
Examples:
- Energy Star qualified product guides
- Appliance EnergyGuide labels
- HVAC system efficiency ratings
- LED lighting energy consumption data
6. Warranty Documents (warranty_documents)
Description: Legal documentation outlining product guarantees, coverage terms, and claim procedures. Essential for customer protection and after-sales support.
Includes:
- Product warranty certificates and terms
- Extended warranty offers and conditions
- Warranty claim forms and procedures
- Service agreement documentation
Use Cases:
- Customer service handling warranty claims
- Sales teams explaining warranty coverage
- Legal departments managing warranty terms
- Customers understanding their rights and coverage
Examples:
- Manufacturer's limited warranty certificates
- Extended warranty contracts
- Warranty registration cards
- Service plan agreements
Performance
Overall Accuracy: 93.97%
The model achieves high accuracy across all 6 document categories, making it suitable for production environments requiring reliable document classification.
Training Details
Dataset
- Total Images: ~180,000 industrial document images
- Class Distribution: Approximately balanced across all 6 categories
- Image Types: Scanned documents, PDF pages converted to images, digital documents
Hyperparameters
- Epochs: 8
- Learning Rate: 2e-4
- Batch Size: 64
- Optimizer: AdamW
- Scheduler: Cosine annealing with warmup
- Warmup Ratio: 10%
- Early Stopping: Enabled
- Training Time: ~14 hours
LoRA Configuration
- Rank (r): 32
- Alpha: 64
- Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, out_proj, fc1, fc2
- Bias: None
Hardware
- GPU-accelerated training (CUDA-enabled)
- Mixed precision training (FP16)
Installation
pip install torch torchvision transformers peft pillow
Requirements:
- Python 3.8+
- PyTorch 2.0+
- transformers
- peft
- Pillow (PIL)
Usage
Download the Model
# Download model file
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/industrial-document-classifier-clip-lora.pt
# Download inference scripts
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/pipeline.py
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/main.py
Basic Usage
from pipeline import DocumentClassifier
# Initialize classifier (automatically loads the model)
classifier = DocumentClassifier()
# Prepare list of document image paths
image_paths = [
"path/to/specification_sheet.jpg",
"path/to/warranty_doc.png",
"path/to/manual.jpeg",
]
# Get predictions
results = classifier.predict(image_paths)
# Process results
for result in results:
print(f"\nImage: {result['image_path']}")
if result['error_response']:
print(f"Error: {result['error_response']}")
else:
# Sort predictions by probability
sorted_predictions = sorted(
result['predictions'].items(),
key=lambda x: x[1],
reverse=True
)
print("Predictions:")
for class_name, probability in sorted_predictions:
print(f" {class_name}: {probability:.4f}")
# Get top prediction
top_class = sorted_predictions[0][0]
top_confidence = sorted_predictions[0][1]
print(f"\nTop Prediction: {top_class} ({top_confidence:.2%} confidence)")
# Release model from memory when done
classifier.unload()
Supported Image Formats
The model accepts all common image formats:
.jpg,.jpeg(JPEG).png(PNG).bmp(Bitmap).gif(GIF).tiff,.tif(TIFF).webp(WebP).ico(Icon).heic,.heif(HEIC)
Output Format
Each prediction returns a dictionary with:
{
"image_path": "path/to/image.jpg",
"predictions": {
"product_information": 0.8543,
"engineering_drawings": 0.0821,
"instructional_guides": 0.0342,
"compliance_certificates": 0.0156,
"energy_ratings": 0.0089,
"warranty_documents": 0.0049
},
"error_response": "" # Empty string if successful, error message if failed
}
Batch Processing
The model automatically handles batch processing with multi-threading for optimal performance:
# Process large batches efficiently
large_batch = [f"document_{i}.jpg" for i in range(1000)]
results = classifier.predict(large_batch)
# Results are returned in the same order as input
assert len(results) == len(large_batch)
Performance Features:
- Multi-threaded inference using all available CPU cores
- Automatic batching (16 images per batch)
- GPU acceleration when available
- Maintains input order in results
Model Architecture
Input Image (Any supported format)
β
CLIP Vision Transformer (ViT-B/16)
β
[With LoRA adapters on attention layers]
β
Vision Embeddings (768-dim)
β
Classification Head (6 classes)
β
Softmax Probabilities
LoRA Integration:
- Applied to all attention projection layers (q, k, v, output)
- Applied to feed-forward layers (fc1, fc2)
- Reduces trainable parameters by 98% while maintaining performance
- Enables efficient fine-tuning on domain-specific data
Use Cases
Manufacturing & Industrial
- Organize technical documentation repositories
- Route documents to appropriate departments
- Automate quality control document verification
- Manage compliance certification libraries
E-commerce & Retail
- Categorize product documentation for online listings
- Organize supplier documentation
- Manage warranty and compliance documents
- Automate document ingestion pipelines
Supply Chain & Procurement
- Classify vendor-provided documentation
- Verify compliance certificates
- Organize product specifications
- Manage installation and service documentation
Facility Management
- Organize equipment manuals and specifications
- Track warranty documentation
- Manage energy efficiency certifications
- Maintain compliance records
Limitations
- Training Domain: Optimized for industrial and technical documents; may have reduced accuracy on consumer documents, artistic content, or non-technical materials
- Language: Trained primarily on English-language documents
- Image Quality: Performance may degrade with:
- Very low resolution images (<224x224 pixels)
- Severely distorted or rotated documents
- Handwritten documents
- Documents with heavy watermarks or overlays
- Document Types: Not designed for:
- Multi-page document analysis (processes single images)
- Text-heavy documents requiring OCR
- Non-document images (photographs, artwork, etc.)
- Ambiguous Documents: Some documents may legitimately belong to multiple categories (e.g., a manual that includes warranty information)
Ethical Considerations
- Transparency: This model should be used as part of a larger document management system with human oversight for critical decisions
- Bias: Training data distribution may affect performance across different industries or document styles
- Privacy: Ensure compliance with data privacy regulations when processing proprietary or sensitive documents
- Automation Limits: Human review is recommended for legally binding documents, compliance certifications, and critical applications
- Regular Updates: Document styles and formats evolve; periodic retraining may be necessary to maintain performance
Best Practices
- Pre-processing: Ensure documents are properly oriented and cropped to content area
- Quality Control: Implement confidence thresholds for automated workflows
- Human Review: Set up review processes for predictions below confidence thresholds
- Monitoring: Track prediction confidence distributions to identify potential drift
- Batch Processing: Process documents in batches for optimal performance
- Resource Management: Call
unload()to free GPU/CPU memory when done
Citation
If you use this model in your research or production systems, please cite:
@misc{shaikh2025industrialdocclassifier,
author = {Sheroz Shaikh},
title = {Industrial Document Classifier using CLIP with LoRA},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora}}
}
License
MIT License - See LICENSE file for details.
Model Card Authors
Sheroz Shaikh
- HuggingFace: @ssheroz
Acknowledgments
- Base model: OpenAI CLIP ViT-B/16
- Fine-tuning method: LoRA (Low-Rank Adaptation) via Hugging Face PEFT library
- Framework: PyTorch, Hugging Face Transformers
- Downloads last month
- -
Model tree for ssheroz/industrial-document-classifier-clip-lora
Base model
openai/clip-vit-base-patch16