PatentMap-V0-Dropout

PatentMap-V0-Dropout is a patent embedding model trained on abstract section with dropout augmentation. It is part of the PatentMap V0 model collection.

Model Details

Base Model: anferico/bert-for-patents
Training Objective: Contrastive learning (InfoNCE loss)
Architecture: BERT-large (340M parameters)
Embedding Dimension: 1024
Max Sequence Length: 512 tokens
Vocabulary Size: 39859
Training Data: USPTO patent grants (2010-2018) from HUPD corpus

Training Configuration

Patent Sections Used: abstract
Data Augmentation: dropout only
Batch Size: 512
Learning Rate: 1e-5

Usage

Input Format

This model expects patent text formatted with special tokens:

For abstract: Title [SEP] [abstract] Abstract text
For other sections: [section] Section text (no title prefix)

Example:

# Abstract with title
text = "Smart thermostat system [SEP] [abstract] A thermostat system comprising..."

# Claim without title
text = "[claim] A method comprising: step 1, step 2..."

Code Example

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "ZoeYou/PatentMap-V0-Dropout"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Format patent text
title = "Smart thermostat system"
abstract = "A thermostat system comprising a temperature sensor..."
patent_text = f"{title} [SEP] [abstract] {abstract}"

# Encode and get embeddings
inputs = tokenizer(patent_text, return_tensors="pt", padding=True, truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
    
print(embeddings.shape)  # torch.Size([1, 1024])

Evaluation

This model has been evaluated on multiple patent-specific tasks:

IPC Classification (linear probe and KNN)
Prior Art Search (recall@k, nDCG@k)
Embedding Quality Metrics (uniformity, alignment, topology)

For detailed evaluation results, see the PatentMap paper.

Intended Use

This model is designed for:

Patent document retrieval
Patent similarity search
Prior art discovery
IPC classification
Patent landscape analysis

Citation

If you use this model, please cite:

@article{zuo2025patent,
  title={Patent Representation Learning via Self-supervision},
  author={Zuo, You and Gerdes, Kim and de La Clergerie, Eric Villemonte and Sagot, Beno{\^i}t},
  journal={arXiv preprint arXiv:2511.10657},
  year={2025}
}

Model Collection

This model is part of the PatentMap V0 collection. For an overview of all models, see PatentMap-V0.

License

This model is released under CC BY-NC 4.0 license (non-commercial use only).

Contact

For questions or issues, please open an issue on the GitHub repository or contact the authors.

Downloads last month: 8

Safetensors

Model size

0.3B params

Tensor type

F16

Model tree for ZoeYou/PatentMap-V0-Dropout

Base model

anferico/bert-for-patents

Finetuned

(25)

this model

Paper for ZoeYou/PatentMap-V0-Dropout

Patent Representation Learning via Self-supervision

Paper • 2511.10657 • Published Nov 3, 2025