BooMarshmello's picture
Update README.md
12c6d41 verified
---
license: other
license_name: qwen
license_link: LICENSE
datasets:
- linxy/LaTeX_OCR
- OleehyO/latex-formulas
metrics:
- cer
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
---
# Model Card for Model ID
## summary
<!-- Provide a quick summary of what the model is/does. -->
This is a finetuned version of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), focusing on the task img2latex.
The model is finetuned on the dataset [OleehyO/latex-formulas](https://huggingface.com/datasets/OleehyO/latex-formulas) with 2 epochs to enhance latex ocr capability,
and one epoch on [linxy/LaTeX-OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR) to regulate the model's output.
This work is inspired by [prithivMLmods/Qwen2-VL-OCR-2B-Instruct](https://huggingface.co/prithivMLmods/Qwen2-VL-OCR-2B-Instruct).
## evaluation
| model | metric | value |
|-----------------------------------------------|-------------------|-------|
| prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) | rouge-l: f1-score | 0.88 |
| | CER | 0.24 |
| etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) | rouge-l: f1-score | 0.91 |
| | CER | 0.21 |
| | | |
The improvement probably comes from:
1. model's upgrade, for sure...?
2. larger dataset: 100K -> 550K
There is an even MUCH larger dataset [OleehyO/latex-formulas-80M](https://huggingface.co/datasets/OleehyO/latex-formulas-80M), but my computing resources are limited.