BERTu (SIB-200 Maltese)
This model is a fine-tuned version of MLRS/BERTu on the Davlan/sib200 mlt_Latn dataset. It achieves the following results on the test set:
- Loss: 0.5018
- F1: 0.8621
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 32
- seed: 3
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_ratio: 0.005
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
| Training Loss | Epoch | Step | Validation Loss | F1 |
|---|---|---|---|---|
| No log | 1.0 | 44 | 1.5054 | 0.4062 |
| No log | 2.0 | 88 | 0.8147 | 0.8010 |
| No log | 3.0 | 132 | 0.5343 | 0.8243 |
| No log | 4.0 | 176 | 0.4906 | 0.8290 |
| No log | 5.0 | 220 | 0.4502 | 0.8505 |
| No log | 6.0 | 264 | 0.4615 | 0.8450 |
| No log | 7.0 | 308 | 0.5045 | 0.8552 |
| No log | 8.0 | 352 | 0.5117 | 0.8525 |
| No log | 9.0 | 396 | 0.5132 | 0.8684 |
| No log | 10.0 | 440 | 0.5334 | 0.8607 |
| No log | 11.0 | 484 | 0.5530 | 0.8592 |
| 0.3355 | 12.0 | 528 | 0.5476 | 0.8607 |
| 0.3355 | 13.0 | 572 | 0.5605 | 0.8684 |
| 0.3355 | 14.0 | 616 | 0.5683 | 0.8607 |
| 0.3355 | 15.0 | 660 | 0.5689 | 0.8607 |
| 0.3355 | 16.0 | 704 | 0.5729 | 0.8607 |
| 0.3355 | 17.0 | 748 | 0.5831 | 0.8607 |
| 0.3355 | 18.0 | 792 | 0.5860 | 0.8607 |
| 0.3355 | 19.0 | 836 | 0.5919 | 0.8607 |
| 0.3355 | 20.0 | 880 | 0.5971 | 0.8684 |
| 0.3355 | 21.0 | 924 | 0.6006 | 0.8607 |
| 0.3355 | 22.0 | 968 | 0.6053 | 0.8607 |
| 0.0037 | 23.0 | 1012 | 0.6094 | 0.8607 |
| 0.0037 | 24.0 | 1056 | 0.6141 | 0.8607 |
| 0.0037 | 25.0 | 1100 | 0.6177 | 0.8684 |
| 0.0037 | 26.0 | 1144 | 0.6202 | 0.8607 |
| 0.0037 | 27.0 | 1188 | 0.6241 | 0.8684 |
| 0.0037 | 28.0 | 1232 | 0.6291 | 0.8684 |
| 0.0037 | 29.0 | 1276 | 0.6328 | 0.8684 |
Framework versions
- Transformers 4.51.1
- Pytorch 2.7.0+cu126
- Datasets 3.2.0
- Tokenizers 0.21.1
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/BERTu_sib200-mlt
Base model
MLRS/BERTuDataset used to train MLRS/BERTu_sib200-mlt
Collection including MLRS/BERTu_sib200-mlt
Evaluation results
- Macro-averaged F1 on Davlan/sib200MELABench Leaderboard86.210
