GlassLewis
/

roberta-large-entity-linking

Sentence Similarity

Safetensors

roberta

Model card Files Files and versions

xet

Community

zdanGL commited on May 28, 2025

Commit

ac7dbf9

verified ·

1 Parent(s): 95bd3b1

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -12,8 +12,6 @@ base_model:
 **roberta-large-entity-linking** is a [RoBERTa large model](https://huggingface.co/FacebookAI/roberta-large) fine-tuned as a [bi-encoder](https://arxiv.org/pdf/1811.08008) for [entity linking](https://en.wikipedia.org/wiki/Entity_linking) tasks. The model separately embeds mentions-in-context and entity descriptions to enable semantic matching between text mentions and knowledge base entities.
-## Intended Uses
 ### Primary Use Cases
 - **Entity Linking:** Link Wikipedia concepts mentioned in text to their corresponding Wikipedia pages. With [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia) [Wikimedia](https://huggingface.co/wikimedia) makes it easy, you can embed the entries in the "abstract" column (you may need to do some cleanup to filter out irrelevant entries).
 - **Zero-shot Entity Linking:** Link entities to knowledge bases without task-specific training
@@ -103,14 +101,10 @@ for i, definition in enumerate(definitions):
     print(f"Similarity: {sim_value:.4f}\n")
 ```
-## Model Details
 ### Training Data
 - **Dataset:** 3 million pairs of Wikipedia anchor text links and Wikipedia page abstracts, derived from [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia)
 - **Special Token:** `[ENT]` token added to vocabulary mark entity mentions
 ### Training Details
 - **Hardware:** Single 80GB H100 GPU
 - **Batch Size:** 80
@@ -118,8 +112,6 @@ for i, definition in enumerate(definitions):
 - **Loss Function:** Batch hard triplet loss (margin=0.4)
 - **Max Sequence Length:** 256 tokens (both mentions and descriptions)
-## Performance
 ### Benchmark Results
 - **Dataset:** Zero-Shot Entity Linking [(Logeswaran et al., 2019)](https://arxiv.org/abs/1906.07348)
 - **Metric:** Recall@64

 **roberta-large-entity-linking** is a [RoBERTa large model](https://huggingface.co/FacebookAI/roberta-large) fine-tuned as a [bi-encoder](https://arxiv.org/pdf/1811.08008) for [entity linking](https://en.wikipedia.org/wiki/Entity_linking) tasks. The model separately embeds mentions-in-context and entity descriptions to enable semantic matching between text mentions and knowledge base entities.
 ### Primary Use Cases
 - **Entity Linking:** Link Wikipedia concepts mentioned in text to their corresponding Wikipedia pages. With [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia) [Wikimedia](https://huggingface.co/wikimedia) makes it easy, you can embed the entries in the "abstract" column (you may need to do some cleanup to filter out irrelevant entries).
 - **Zero-shot Entity Linking:** Link entities to knowledge bases without task-specific training
     print(f"Similarity: {sim_value:.4f}\n")
 ```
 ### Training Data
 - **Dataset:** 3 million pairs of Wikipedia anchor text links and Wikipedia page abstracts, derived from [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia)
 - **Special Token:** `[ENT]` token added to vocabulary mark entity mentions
 ### Training Details
 - **Hardware:** Single 80GB H100 GPU
 - **Batch Size:** 80
 - **Loss Function:** Batch hard triplet loss (margin=0.4)
 - **Max Sequence Length:** 256 tokens (both mentions and descriptions)
 ### Benchmark Results
 - **Dataset:** Zero-Shot Entity Linking [(Logeswaran et al., 2019)](https://arxiv.org/abs/1906.07348)
 - **Metric:** Recall@64