Industrial Document Classifier - CLIP ViT-B/16 with LoRA

This model classifies industrial and technical documents into 6 primary categories using a CLIP vision model fine-tuned with LoRA (Low-Rank Adaptation). Designed for manufacturing, e-commerce, and industrial applications where automated document organization is essential.

Model Description

Base Model: openai/clip-vit-base-patch16
Architecture: CLIP Vision Transformer (ViT-B/16)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank (r): 32
LoRA Alpha: 64
Task: Multi-class Document Image Classification
Training Dataset: ~180,000 industrial document images
Base Parameters: ~63.1M (CLIP ViT-B/16 vision tower)
Trainable LoRA Parameters: ~1.2M (1.9% of base model)
Total Parameters: ~64.3M

Document Categories

The model classifies documents into 6 parent categories designed for industrial and e-commerce environments:

1. Product Information (`product_information`)

Description: Documents that provide detailed specifications, features, and technical data about products. These are essential for product listings, sales, and customer understanding.

Includes:

Product catalogs with item listings and descriptions
Specification sheets with technical parameters (dimensions, materials, performance metrics)
Technical bulletins announcing product updates or technical advisories

Use Cases:

E-commerce platforms organizing product documentation
Sales teams accessing product specifications
Marketing departments creating product literature
Customer support referencing product details

Examples:

HVAC equipment specification sheets
Electronic component catalogs
Industrial machinery product brochures
Chemical product technical data sheets

2. Engineering Drawings (`engineering_drawings`)

Description: Technical drawings and schematics that provide visual representations of product design, dimensions, and assembly details. Critical for manufacturing, installation, and maintenance.

Includes:

Full engineering drawings with detailed dimensions and tolerances
Line drawings showing simplified product views
CAD-generated technical illustrations
Assembly diagrams and exploded views

Use Cases:

Manufacturing teams producing components
Installation contractors planning installations
Quality assurance verifying product specifications
Maintenance teams identifying parts and assemblies

Examples:

Mechanical part blueprints with GD&T annotations
Electrical circuit schematics
Plumbing fixture installation diagrams
Structural component drawings

3. Instructional Guides (`instructional_guides`)

Description: Step-by-step documentation that guides users through installation, operation, maintenance, or repair procedures. Essential for safe and effective product use.

Includes:

Installation and instruction manuals for setup procedures
Owner's and user manuals for operation guidance
Service manuals for repair and maintenance procedures
Quick start guides and troubleshooting documentation

Use Cases:

End users learning to operate equipment
Installation professionals following setup procedures
Service technicians performing repairs
Training departments educating staff

Examples:

Appliance installation guides
Software user manuals
Equipment operation handbooks
Maintenance procedure documents

4. Compliance Certificates (`compliance_certificates`)

Description: Official documentation proving products meet regulatory standards, safety requirements, and material specifications. Critical for legal compliance and quality assurance.

Includes:

Material Test Reports (MTR) certifying material composition and properties
Safety Data Sheets (SDS) detailing chemical hazards and handling
RoHS (Restriction of Hazardous Substances) compliance certificates
Quality certifications and test reports

Use Cases:

Procurement teams verifying supplier compliance
Quality control departments validating materials
Regulatory affairs ensuring legal compliance
Environmental health and safety managing hazardous materials

Examples:

Steel MTR certificates with chemical composition
Chemical SDS for workplace safety
RoHS compliance declarations for electronics
ISO certification documents

5. Energy Ratings (`energy_ratings`)

Description: Documentation related to energy efficiency ratings, consumption data, and environmental performance certifications. Important for sustainability and regulatory compliance.

Includes:

Energy Star certification guides and labels
Energy efficiency ratings and performance data
Environmental impact assessments
Carbon footprint documentation

Use Cases:

Purchasing departments selecting energy-efficient equipment
Sustainability teams tracking environmental impact
Facility managers optimizing energy consumption
Regulatory compliance for energy standards

Examples:

Energy Star qualified product guides
Appliance EnergyGuide labels
HVAC system efficiency ratings
LED lighting energy consumption data

6. Warranty Documents (`warranty_documents`)

Description: Legal documentation outlining product guarantees, coverage terms, and claim procedures. Essential for customer protection and after-sales support.

Includes:

Product warranty certificates and terms
Extended warranty offers and conditions
Warranty claim forms and procedures
Service agreement documentation

Use Cases:

Customer service handling warranty claims
Sales teams explaining warranty coverage
Legal departments managing warranty terms
Customers understanding their rights and coverage

Examples:

Manufacturer's limited warranty certificates
Extended warranty contracts
Warranty registration cards
Service plan agreements

Performance

Overall Accuracy: 93.97%

The model achieves high accuracy across all 6 document categories, making it suitable for production environments requiring reliable document classification.

Training Details

Dataset

Total Images: ~180,000 industrial document images
Class Distribution: Approximately balanced across all 6 categories
Image Types: Scanned documents, PDF pages converted to images, digital documents

Hyperparameters

Epochs: 8
Learning Rate: 2e-4
Batch Size: 64
Optimizer: AdamW
Scheduler: Cosine annealing with warmup
Warmup Ratio: 10%
Early Stopping: Enabled
Training Time: ~14 hours

LoRA Configuration

Rank (r): 32
Alpha: 64
Dropout: 0.1
Target Modules: q_proj, k_proj, v_proj, out_proj, fc1, fc2
Bias: None

Hardware

GPU-accelerated training (CUDA-enabled)
Mixed precision training (FP16)

Installation

pip install torch torchvision transformers peft pillow

Requirements:

Python 3.8+
PyTorch 2.0+
transformers
peft
Pillow (PIL)

Usage

Download the Model

# Download model file
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/industrial-document-classifier-clip-lora.pt

# Download inference scripts
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/pipeline.py
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/main.py

Basic Usage

from pipeline import DocumentClassifier

# Initialize classifier (automatically loads the model)
classifier = DocumentClassifier()

# Prepare list of document image paths
image_paths = [
    "path/to/specification_sheet.jpg",
    "path/to/warranty_doc.png",
    "path/to/manual.jpeg",
]

# Get predictions
results = classifier.predict(image_paths)

# Process results
for result in results:
    print(f"\nImage: {result['image_path']}")
    
    if result['error_response']:
        print(f"Error: {result['error_response']}")
    else:
        # Sort predictions by probability
        sorted_predictions = sorted(
            result['predictions'].items(), 
            key=lambda x: x[1], 
            reverse=True
        )
        
        print("Predictions:")
        for class_name, probability in sorted_predictions:
            print(f"  {class_name}: {probability:.4f}")
        
        # Get top prediction
        top_class = sorted_predictions[0][0]
        top_confidence = sorted_predictions[0][1]
        print(f"\nTop Prediction: {top_class} ({top_confidence:.2%} confidence)")

# Release model from memory when done
classifier.unload()

Supported Image Formats

The model accepts all common image formats:

.jpg, .jpeg (JPEG)
.png (PNG)
.bmp (Bitmap)
.gif (GIF)
.tiff, .tif (TIFF)
.webp (WebP)
.ico (Icon)
.heic, .heif (HEIC)

Output Format

Each prediction returns a dictionary with:

{
    "image_path": "path/to/image.jpg",
    "predictions": {
        "product_information": 0.8543,
        "engineering_drawings": 0.0821,
        "instructional_guides": 0.0342,
        "compliance_certificates": 0.0156,
        "energy_ratings": 0.0089,
        "warranty_documents": 0.0049
    },
    "error_response": ""  # Empty string if successful, error message if failed
}

Batch Processing

The model automatically handles batch processing with multi-threading for optimal performance:

# Process large batches efficiently
large_batch = [f"document_{i}.jpg" for i in range(1000)]
results = classifier.predict(large_batch)

# Results are returned in the same order as input
assert len(results) == len(large_batch)

Performance Features:

Multi-threaded inference using all available CPU cores
Automatic batching (16 images per batch)
GPU acceleration when available
Maintains input order in results

Model Architecture

Input Image (Any supported format)
    ↓
CLIP Vision Transformer (ViT-B/16)
    ↓
[With LoRA adapters on attention layers]
    ↓
Vision Embeddings (768-dim)
    ↓
Classification Head (6 classes)
    ↓
Softmax Probabilities

LoRA Integration:

Applied to all attention projection layers (q, k, v, output)
Applied to feed-forward layers (fc1, fc2)
Reduces trainable parameters by 98% while maintaining performance
Enables efficient fine-tuning on domain-specific data

Use Cases

Manufacturing & Industrial

Organize technical documentation repositories
Route documents to appropriate departments
Automate quality control document verification
Manage compliance certification libraries

E-commerce & Retail

Categorize product documentation for online listings
Organize supplier documentation
Manage warranty and compliance documents
Automate document ingestion pipelines

Supply Chain & Procurement

Classify vendor-provided documentation
Verify compliance certificates
Organize product specifications
Manage installation and service documentation

Facility Management

Organize equipment manuals and specifications
Track warranty documentation
Manage energy efficiency certifications
Maintain compliance records

Limitations

Training Domain: Optimized for industrial and technical documents; may have reduced accuracy on consumer documents, artistic content, or non-technical materials
Language: Trained primarily on English-language documents
Image Quality: Performance may degrade with:
- Very low resolution images (<224x224 pixels)
- Severely distorted or rotated documents
- Handwritten documents
- Documents with heavy watermarks or overlays
Document Types: Not designed for:
- Multi-page document analysis (processes single images)
- Text-heavy documents requiring OCR
- Non-document images (photographs, artwork, etc.)
Ambiguous Documents: Some documents may legitimately belong to multiple categories (e.g., a manual that includes warranty information)

Ethical Considerations

Transparency: This model should be used as part of a larger document management system with human oversight for critical decisions
Bias: Training data distribution may affect performance across different industries or document styles
Privacy: Ensure compliance with data privacy regulations when processing proprietary or sensitive documents
Automation Limits: Human review is recommended for legally binding documents, compliance certifications, and critical applications
Regular Updates: Document styles and formats evolve; periodic retraining may be necessary to maintain performance

Best Practices

Pre-processing: Ensure documents are properly oriented and cropped to content area
Quality Control: Implement confidence thresholds for automated workflows
Human Review: Set up review processes for predictions below confidence thresholds
Monitoring: Track prediction confidence distributions to identify potential drift
Batch Processing: Process documents in batches for optimal performance
Resource Management: Call unload() to free GPU/CPU memory when done

Citation

If you use this model in your research or production systems, please cite:

@misc{shaikh2025industrialdocclassifier,
  author = {Sheroz Shaikh},
  title = {Industrial Document Classifier using CLIP with LoRA},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora}}
}

License

MIT License - See LICENSE file for details.

Model Card Authors

Sheroz Shaikh

HuggingFace: @ssheroz

Acknowledgments

Base model: OpenAI CLIP ViT-B/16
Fine-tuning method: LoRA (Low-Rank Adaptation) via Hugging Face PEFT library
Framework: PyTorch, Hugging Face Transformers

Downloads last month: -

Model tree for ssheroz/industrial-document-classifier-clip-lora

Base model

openai/clip-vit-base-patch16

Adapter

(17)

this model

Industrial Document Classifier - CLIP ViT-B/16 with LoRA

Model Description

Document Categories

1. Product Information (product_information)

2. Engineering Drawings (engineering_drawings)

3. Instructional Guides (instructional_guides)

4. Compliance Certificates (compliance_certificates)

5. Energy Ratings (energy_ratings)

6. Warranty Documents (warranty_documents)

Performance

Training Details

Dataset

Hyperparameters

LoRA Configuration

Hardware

Installation

Usage

Download the Model

Basic Usage

Supported Image Formats

Output Format

Batch Processing

Model Architecture

Use Cases

Manufacturing & Industrial

E-commerce & Retail

Supply Chain & Procurement

Facility Management

Limitations

Ethical Considerations

Best Practices

Citation

License

Model Card Authors

Acknowledgments

Model tree for ssheroz/industrial-document-classifier-clip-lora

1. Product Information (`product_information`)

2. Engineering Drawings (`engineering_drawings`)

3. Instructional Guides (`instructional_guides`)

4. Compliance Certificates (`compliance_certificates`)

5. Energy Ratings (`energy_ratings`)

6. Warranty Documents (`warranty_documents`)