|
|
--- |
|
|
license: other |
|
|
license_name: qwen |
|
|
license_link: LICENSE |
|
|
datasets: |
|
|
- linxy/LaTeX_OCR |
|
|
- OleehyO/latex-formulas |
|
|
metrics: |
|
|
- cer |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-VL-3B-Instruct |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
## summary |
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This is a finetuned version of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), focusing on the task img2latex. |
|
|
|
|
|
The model is finetuned on the dataset [OleehyO/latex-formulas](https://huggingface.com/datasets/OleehyO/latex-formulas) with 2 epochs to enhance latex ocr capability, |
|
|
and one epoch on [linxy/LaTeX-OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR) to regulate the model's output. |
|
|
|
|
|
This work is inspired by [prithivMLmods/Qwen2-VL-OCR-2B-Instruct](https://huggingface.co/prithivMLmods/Qwen2-VL-OCR-2B-Instruct). |
|
|
|
|
|
## evaluation |
|
|
|
|
|
|
|
|
| model | metric | value | |
|
|
|-----------------------------------------------|-------------------|-------| |
|
|
| prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) | rouge-l: f1-score | 0.88 | |
|
|
| | CER | 0.24 | |
|
|
| etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) | rouge-l: f1-score | 0.91 | |
|
|
| | CER | 0.21 | |
|
|
| | | | |
|
|
|
|
|
The improvement probably comes from: |
|
|
|
|
|
1. model's upgrade, for sure...? |
|
|
2. larger dataset: 100K -> 550K |
|
|
|
|
|
There is an even MUCH larger dataset [OleehyO/latex-formulas-80M](https://huggingface.co/datasets/OleehyO/latex-formulas-80M), but my computing resources are limited. |
|
|
|