Industrial Document Classifier - CLIP ViT-B/16 with LoRA

This model classifies industrial and technical documents into 6 primary categories using a CLIP vision model fine-tuned with LoRA (Low-Rank Adaptation). Designed for manufacturing, e-commerce, and industrial applications where automated document organization is essential.

Model Description

  • Base Model: openai/clip-vit-base-patch16
  • Architecture: CLIP Vision Transformer (ViT-B/16)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank (r): 32
  • LoRA Alpha: 64
  • Task: Multi-class Document Image Classification
  • Training Dataset: ~180,000 industrial document images
  • Base Parameters: ~63.1M (CLIP ViT-B/16 vision tower)
  • Trainable LoRA Parameters: ~1.2M (1.9% of base model)
  • Total Parameters: ~64.3M

Document Categories

The model classifies documents into 6 parent categories designed for industrial and e-commerce environments:

1. Product Information (product_information)

Description: Documents that provide detailed specifications, features, and technical data about products. These are essential for product listings, sales, and customer understanding.

Includes:

  • Product catalogs with item listings and descriptions
  • Specification sheets with technical parameters (dimensions, materials, performance metrics)
  • Technical bulletins announcing product updates or technical advisories

Use Cases:

  • E-commerce platforms organizing product documentation
  • Sales teams accessing product specifications
  • Marketing departments creating product literature
  • Customer support referencing product details

Examples:

  • HVAC equipment specification sheets
  • Electronic component catalogs
  • Industrial machinery product brochures
  • Chemical product technical data sheets

2. Engineering Drawings (engineering_drawings)

Description: Technical drawings and schematics that provide visual representations of product design, dimensions, and assembly details. Critical for manufacturing, installation, and maintenance.

Includes:

  • Full engineering drawings with detailed dimensions and tolerances
  • Line drawings showing simplified product views
  • CAD-generated technical illustrations
  • Assembly diagrams and exploded views

Use Cases:

  • Manufacturing teams producing components
  • Installation contractors planning installations
  • Quality assurance verifying product specifications
  • Maintenance teams identifying parts and assemblies

Examples:

  • Mechanical part blueprints with GD&T annotations
  • Electrical circuit schematics
  • Plumbing fixture installation diagrams
  • Structural component drawings

3. Instructional Guides (instructional_guides)

Description: Step-by-step documentation that guides users through installation, operation, maintenance, or repair procedures. Essential for safe and effective product use.

Includes:

  • Installation and instruction manuals for setup procedures
  • Owner's and user manuals for operation guidance
  • Service manuals for repair and maintenance procedures
  • Quick start guides and troubleshooting documentation

Use Cases:

  • End users learning to operate equipment
  • Installation professionals following setup procedures
  • Service technicians performing repairs
  • Training departments educating staff

Examples:

  • Appliance installation guides
  • Software user manuals
  • Equipment operation handbooks
  • Maintenance procedure documents

4. Compliance Certificates (compliance_certificates)

Description: Official documentation proving products meet regulatory standards, safety requirements, and material specifications. Critical for legal compliance and quality assurance.

Includes:

  • Material Test Reports (MTR) certifying material composition and properties
  • Safety Data Sheets (SDS) detailing chemical hazards and handling
  • RoHS (Restriction of Hazardous Substances) compliance certificates
  • Quality certifications and test reports

Use Cases:

  • Procurement teams verifying supplier compliance
  • Quality control departments validating materials
  • Regulatory affairs ensuring legal compliance
  • Environmental health and safety managing hazardous materials

Examples:

  • Steel MTR certificates with chemical composition
  • Chemical SDS for workplace safety
  • RoHS compliance declarations for electronics
  • ISO certification documents

5. Energy Ratings (energy_ratings)

Description: Documentation related to energy efficiency ratings, consumption data, and environmental performance certifications. Important for sustainability and regulatory compliance.

Includes:

  • Energy Star certification guides and labels
  • Energy efficiency ratings and performance data
  • Environmental impact assessments
  • Carbon footprint documentation

Use Cases:

  • Purchasing departments selecting energy-efficient equipment
  • Sustainability teams tracking environmental impact
  • Facility managers optimizing energy consumption
  • Regulatory compliance for energy standards

Examples:

  • Energy Star qualified product guides
  • Appliance EnergyGuide labels
  • HVAC system efficiency ratings
  • LED lighting energy consumption data

6. Warranty Documents (warranty_documents)

Description: Legal documentation outlining product guarantees, coverage terms, and claim procedures. Essential for customer protection and after-sales support.

Includes:

  • Product warranty certificates and terms
  • Extended warranty offers and conditions
  • Warranty claim forms and procedures
  • Service agreement documentation

Use Cases:

  • Customer service handling warranty claims
  • Sales teams explaining warranty coverage
  • Legal departments managing warranty terms
  • Customers understanding their rights and coverage

Examples:

  • Manufacturer's limited warranty certificates
  • Extended warranty contracts
  • Warranty registration cards
  • Service plan agreements

Performance

Overall Accuracy: 93.97%

The model achieves high accuracy across all 6 document categories, making it suitable for production environments requiring reliable document classification.

Training Details

Dataset

  • Total Images: ~180,000 industrial document images
  • Class Distribution: Approximately balanced across all 6 categories
  • Image Types: Scanned documents, PDF pages converted to images, digital documents

Hyperparameters

  • Epochs: 8
  • Learning Rate: 2e-4
  • Batch Size: 64
  • Optimizer: AdamW
  • Scheduler: Cosine annealing with warmup
  • Warmup Ratio: 10%
  • Early Stopping: Enabled
  • Training Time: ~14 hours

LoRA Configuration

  • Rank (r): 32
  • Alpha: 64
  • Dropout: 0.1
  • Target Modules: q_proj, k_proj, v_proj, out_proj, fc1, fc2
  • Bias: None

Hardware

  • GPU-accelerated training (CUDA-enabled)
  • Mixed precision training (FP16)

Installation

pip install torch torchvision transformers peft pillow

Requirements:

  • Python 3.8+
  • PyTorch 2.0+
  • transformers
  • peft
  • Pillow (PIL)

Usage

Download the Model

# Download model file
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/industrial-document-classifier-clip-lora.pt

# Download inference scripts
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/pipeline.py
wget https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora/resolve/main/main.py

Basic Usage

from pipeline import DocumentClassifier

# Initialize classifier (automatically loads the model)
classifier = DocumentClassifier()

# Prepare list of document image paths
image_paths = [
    "path/to/specification_sheet.jpg",
    "path/to/warranty_doc.png",
    "path/to/manual.jpeg",
]

# Get predictions
results = classifier.predict(image_paths)

# Process results
for result in results:
    print(f"\nImage: {result['image_path']}")
    
    if result['error_response']:
        print(f"Error: {result['error_response']}")
    else:
        # Sort predictions by probability
        sorted_predictions = sorted(
            result['predictions'].items(), 
            key=lambda x: x[1], 
            reverse=True
        )
        
        print("Predictions:")
        for class_name, probability in sorted_predictions:
            print(f"  {class_name}: {probability:.4f}")
        
        # Get top prediction
        top_class = sorted_predictions[0][0]
        top_confidence = sorted_predictions[0][1]
        print(f"\nTop Prediction: {top_class} ({top_confidence:.2%} confidence)")

# Release model from memory when done
classifier.unload()

Supported Image Formats

The model accepts all common image formats:

  • .jpg, .jpeg (JPEG)
  • .png (PNG)
  • .bmp (Bitmap)
  • .gif (GIF)
  • .tiff, .tif (TIFF)
  • .webp (WebP)
  • .ico (Icon)
  • .heic, .heif (HEIC)

Output Format

Each prediction returns a dictionary with:

{
    "image_path": "path/to/image.jpg",
    "predictions": {
        "product_information": 0.8543,
        "engineering_drawings": 0.0821,
        "instructional_guides": 0.0342,
        "compliance_certificates": 0.0156,
        "energy_ratings": 0.0089,
        "warranty_documents": 0.0049
    },
    "error_response": ""  # Empty string if successful, error message if failed
}

Batch Processing

The model automatically handles batch processing with multi-threading for optimal performance:

# Process large batches efficiently
large_batch = [f"document_{i}.jpg" for i in range(1000)]
results = classifier.predict(large_batch)

# Results are returned in the same order as input
assert len(results) == len(large_batch)

Performance Features:

  • Multi-threaded inference using all available CPU cores
  • Automatic batching (16 images per batch)
  • GPU acceleration when available
  • Maintains input order in results

Model Architecture

Input Image (Any supported format)
    ↓
CLIP Vision Transformer (ViT-B/16)
    ↓
[With LoRA adapters on attention layers]
    ↓
Vision Embeddings (768-dim)
    ↓
Classification Head (6 classes)
    ↓
Softmax Probabilities

LoRA Integration:

  • Applied to all attention projection layers (q, k, v, output)
  • Applied to feed-forward layers (fc1, fc2)
  • Reduces trainable parameters by 98% while maintaining performance
  • Enables efficient fine-tuning on domain-specific data

Use Cases

Manufacturing & Industrial

  • Organize technical documentation repositories
  • Route documents to appropriate departments
  • Automate quality control document verification
  • Manage compliance certification libraries

E-commerce & Retail

  • Categorize product documentation for online listings
  • Organize supplier documentation
  • Manage warranty and compliance documents
  • Automate document ingestion pipelines

Supply Chain & Procurement

  • Classify vendor-provided documentation
  • Verify compliance certificates
  • Organize product specifications
  • Manage installation and service documentation

Facility Management

  • Organize equipment manuals and specifications
  • Track warranty documentation
  • Manage energy efficiency certifications
  • Maintain compliance records

Limitations

  • Training Domain: Optimized for industrial and technical documents; may have reduced accuracy on consumer documents, artistic content, or non-technical materials
  • Language: Trained primarily on English-language documents
  • Image Quality: Performance may degrade with:
    • Very low resolution images (<224x224 pixels)
    • Severely distorted or rotated documents
    • Handwritten documents
    • Documents with heavy watermarks or overlays
  • Document Types: Not designed for:
    • Multi-page document analysis (processes single images)
    • Text-heavy documents requiring OCR
    • Non-document images (photographs, artwork, etc.)
  • Ambiguous Documents: Some documents may legitimately belong to multiple categories (e.g., a manual that includes warranty information)

Ethical Considerations

  • Transparency: This model should be used as part of a larger document management system with human oversight for critical decisions
  • Bias: Training data distribution may affect performance across different industries or document styles
  • Privacy: Ensure compliance with data privacy regulations when processing proprietary or sensitive documents
  • Automation Limits: Human review is recommended for legally binding documents, compliance certifications, and critical applications
  • Regular Updates: Document styles and formats evolve; periodic retraining may be necessary to maintain performance

Best Practices

  1. Pre-processing: Ensure documents are properly oriented and cropped to content area
  2. Quality Control: Implement confidence thresholds for automated workflows
  3. Human Review: Set up review processes for predictions below confidence thresholds
  4. Monitoring: Track prediction confidence distributions to identify potential drift
  5. Batch Processing: Process documents in batches for optimal performance
  6. Resource Management: Call unload() to free GPU/CPU memory when done

Citation

If you use this model in your research or production systems, please cite:

@misc{shaikh2025industrialdocclassifier,
  author = {Sheroz Shaikh},
  title = {Industrial Document Classifier using CLIP with LoRA},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ssheroz/industrial-document-classifier-clip-lora}}
}

License

MIT License - See LICENSE file for details.

Model Card Authors

Sheroz Shaikh

Acknowledgments

  • Base model: OpenAI CLIP ViT-B/16
  • Fine-tuning method: LoRA (Low-Rank Adaptation) via Hugging Face PEFT library
  • Framework: PyTorch, Hugging Face Transformers
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ssheroz/industrial-document-classifier-clip-lora

Adapter
(17)
this model