| | --- |
| | language: en |
| | license: cc-by-sa-4.0 |
| | library_name: span-marker |
| | tags: |
| | - span-marker |
| | - token-classification |
| | - ner |
| | - named-entity-recognition |
| | - generated_from_span_marker_trainer |
| | metrics: |
| | - precision |
| | - recall |
| | - f1 |
| | widget: |
| | - text: Inductively Coupled Plasma - Mass Spectrometry ( ICP - MS ) analysis of Longcliffe |
| | SP52 limestone was undertaken to identify other impurities present , and the effect |
| | of sorbent mass and SO2 concentration on elemental partitioning in the carbonator |
| | between solid sorbent and gaseous phase was investigated , using a bubbler sampling |
| | system . |
| | - text: We extensively evaluate our work against benchmark and competitive protocols |
| | across a range of metrics over three real connectivity and GPS traces such as |
| | Sassy [ 44 ] , San Francisco Cabs [ 45 ] and Infocom 2006 [ 33 ] . |
| | - text: In this research , we developed a robust two - layer classifier that can accurately |
| | classify normal hearing ( NH ) from hearing impaired ( HI ) infants with congenital |
| | sensori - neural hearing loss ( SNHL ) based on their Magnetic Resonance ( MR |
| | ) images . |
| | - text: In situ Peak Force Tapping AFM was employed for determining morphology and |
| | nano - mechanical properties of the surface layer . |
| | - text: By means of a criterion of Gilmer for polynomially dense subsets of the ring |
| | of integers of a number field , we show that , if h∈K[X ] maps every element of |
| | OK of degree n to an algebraic integer , then h(X ) is integral - valued over |
| | OK , that is , h(OK)⊂OK . |
| | pipeline_tag: token-classification |
| | base_model: roberta-base |
| | model-index: |
| | - name: SpanMarker with roberta-base on my-data |
| | results: |
| | - task: |
| | type: token-classification |
| | name: Named Entity Recognition |
| | dataset: |
| | name: my-data |
| | type: unknown |
| | split: test |
| | metrics: |
| | - type: f1 |
| | value: 0.6831683168316832 |
| | name: F1 |
| | - type: precision |
| | value: 0.6934673366834171 |
| | name: Precision |
| | - type: recall |
| | value: 0.6731707317073171 |
| | name: Recall |
| | --- |
| | |
| | # SpanMarker with roberta-base on my-data |
| |
|
| | This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. This SpanMarker model uses [roberta-base](https://huggingface.co/roberta-base) as the underlying encoder. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | - **Model Type:** SpanMarker |
| | - **Encoder:** [roberta-base](https://huggingface.co/roberta-base) |
| | - **Maximum Sequence Length:** 256 tokens |
| | - **Maximum Entity Length:** 8 words |
| | <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) --> |
| | - **Language:** en |
| | - **License:** cc-by-sa-4.0 |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) |
| | - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) |
| |
|
| | ### Model Labels |
| | | Label | Examples | |
| | |:---------|:--------------------------------------------------------------------------------------------------------| |
| | | Data | "Depth time - series", "an overall mitochondrial", "defect" | |
| | | Material | "the subject 's fibroblasts", "COXI , COXII and COXIII subunits", "cross - shore measurement locations" | |
| | | Method | "in vitro", "EFSA", "an approximation" | |
| | | Process | "a significant reduction of synthesis", "translation", "intake" | |
| |
|
| | ## Evaluation |
| |
|
| | ### Metrics |
| | | Label | Precision | Recall | F1 | |
| | |:---------|:----------|:-------|:-------| |
| | | **all** | 0.6935 | 0.6732 | 0.6832 | |
| | | Data | 0.6348 | 0.5979 | 0.6158 | |
| | | Material | 0.7688 | 0.7612 | 0.765 | |
| | | Method | 0.4286 | 0.45 | 0.4390 | |
| | | Process | 0.6985 | 0.6780 | 0.6881 | |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use for Inference |
| |
|
| | ```python |
| | from span_marker import SpanMarkerModel |
| | |
| | # Download from the 🤗 Hub |
| | model = SpanMarkerModel.from_pretrained("span_marker_model_id") |
| | # Run inference |
| | entities = model.predict("In situ Peak Force Tapping AFM was employed for determining morphology and nano - mechanical properties of the surface layer .") |
| | ``` |
| |
|
| | ### Downstream Use |
| | You can finetune this model on your own dataset. |
| |
|
| | <details><summary>Click to expand</summary> |
| |
|
| | ```python |
| | from span_marker import SpanMarkerModel, Trainer |
| | |
| | # Download from the 🤗 Hub |
| | model = SpanMarkerModel.from_pretrained("span_marker_model_id") |
| | |
| | # Specify a Dataset with "tokens" and "ner_tag" columns |
| | dataset = load_dataset("conll2003") # For example CoNLL2003 |
| | |
| | # Initialize a Trainer using the pretrained model & dataset |
| | trainer = Trainer( |
| | model=model, |
| | train_dataset=dataset["train"], |
| | eval_dataset=dataset["validation"], |
| | ) |
| | trainer.train() |
| | trainer.save_model("span_marker_model_id-finetuned") |
| | ``` |
| | </details> |
| |
|
| | <!-- |
| | ### Out-of-Scope Use |
| |
|
| | *List how the model may foreseeably be misused and address what users ought not to do with the model.* |
| | --> |
| |
|
| | <!-- |
| | ## Bias, Risks and Limitations |
| |
|
| | *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
| | --> |
| |
|
| | <!-- |
| | ### Recommendations |
| |
|
| | *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
| | --> |
| |
|
| | ## Training Details |
| |
|
| | ### Training Set Metrics |
| | | Training set | Min | Median | Max | |
| | |:----------------------|:----|:--------|:----| |
| | | Sentence length | 3 | 25.6049 | 106 | |
| | | Entities per sentence | 0 | 5.2439 | 22 | |
| |
|
| | ### Training Hyperparameters |
| | - learning_rate: 5e-05 |
| | - train_batch_size: 8 |
| | - eval_batch_size: 8 |
| | - seed: 42 |
| | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| | - lr_scheduler_type: linear |
| | - lr_scheduler_warmup_ratio: 0.1 |
| | - num_epochs: 10 |
| | |
| | ### Training Results |
| | | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy | |
| | |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:| |
| | | 2.0134 | 300 | 0.0540 | 0.6882 | 0.5687 | 0.6228 | 0.7743 | |
| | | 4.0268 | 600 | 0.0546 | 0.6854 | 0.6737 | 0.6795 | 0.8092 | |
| | | 6.0403 | 900 | 0.0599 | 0.6941 | 0.6927 | 0.6934 | 0.8039 | |
| | | 8.0537 | 1200 | 0.0697 | 0.7096 | 0.6947 | 0.7020 | 0.8190 | |
| | |
| | ### Framework Versions |
| | - Python: 3.10.12 |
| | - SpanMarker: 1.5.0 |
| | - Transformers: 4.36.2 |
| | - PyTorch: 2.0.1+cu118 |
| | - Datasets: 2.16.1 |
| | - Tokenizers: 0.15.0 |
| | |
| | ## Citation |
| | |
| | ### BibTeX |
| | ``` |
| | @software{Aarsen_SpanMarker, |
| | author = {Aarsen, Tom}, |
| | license = {Apache-2.0}, |
| | title = {{SpanMarker for Named Entity Recognition}}, |
| | url = {https://github.com/tomaarsen/SpanMarkerNER} |
| | } |
| | ``` |
| | |
| | <!-- |
| | ## Glossary |
| |
|
| | *Clearly define terms in order to be accessible across audiences.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Authors |
| |
|
| | *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Contact |
| |
|
| | *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
| | --> |