PatentMap-V0-Dropout

PatentMap-V0-Dropout is a patent embedding model trained on abstract section with dropout augmentation. It is part of the PatentMap V0 model collection.

Model Details

  • Base Model: anferico/bert-for-patents
  • Training Objective: Contrastive learning (InfoNCE loss)
  • Architecture: BERT-large (340M parameters)
  • Embedding Dimension: 1024
  • Max Sequence Length: 512 tokens
  • Vocabulary Size: 39859
  • Training Data: USPTO patent grants (2010-2018) from HUPD corpus

Training Configuration

  • Patent Sections Used: abstract
  • Data Augmentation: dropout only
  • Batch Size: 512
  • Learning Rate: 1e-5

Usage

Input Format

This model expects patent text formatted with special tokens:

  • For abstract: Title [SEP] [abstract] Abstract text
  • For other sections: [section] Section text (no title prefix)

Example:

# Abstract with title
text = "Smart thermostat system [SEP] [abstract] A thermostat system comprising..."

# Claim without title
text = "[claim] A method comprising: step 1, step 2..."

Code Example

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "ZoeYou/PatentMap-V0-Dropout"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Format patent text
title = "Smart thermostat system"
abstract = "A thermostat system comprising a temperature sensor..."
patent_text = f"{title} [SEP] [abstract] {abstract}"

# Encode and get embeddings
inputs = tokenizer(patent_text, return_tensors="pt", padding=True, truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
    
print(embeddings.shape)  # torch.Size([1, 1024])

Evaluation

This model has been evaluated on multiple patent-specific tasks:

  • IPC Classification (linear probe and KNN)
  • Prior Art Search (recall@k, nDCG@k)
  • Embedding Quality Metrics (uniformity, alignment, topology)

For detailed evaluation results, see the PatentMap paper.

Intended Use

This model is designed for:

  • Patent document retrieval
  • Patent similarity search
  • Prior art discovery
  • IPC classification
  • Patent landscape analysis

Citation

If you use this model, please cite:

@article{zuo2025patent,
  title={Patent Representation Learning via Self-supervision},
  author={Zuo, You and Gerdes, Kim and de La Clergerie, Eric Villemonte and Sagot, Beno{\^i}t},
  journal={arXiv preprint arXiv:2511.10657},
  year={2025}
}

Model Collection

This model is part of the PatentMap V0 collection. For an overview of all models, see PatentMap-V0.

License

This model is released under CC BY-NC 4.0 license (non-commercial use only).

Contact

For questions or issues, please open an issue on the GitHub repository or contact the authors.

Downloads last month
8
Safetensors
Model size
0.3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZoeYou/PatentMap-V0-Dropout

Finetuned
(25)
this model

Paper for ZoeYou/PatentMap-V0-Dropout