BooMarshmello
/

Qwen2.5-VL-OCR-3B-Instruct

Model card Files Files and versions

Qwen2.5-VL-OCR-3B-Instruct / README.md

BooMarshmello's picture

Update README.md

12c6d41 verified 9 months ago

|

history blame contribute delete

1.67 kB

	---
	license: other
	license_name: qwen
	license_link: LICENSE
	datasets:
	- linxy/LaTeX_OCR
	- OleehyO/latex-formulas
	metrics:
	- cer
	base_model:
	- Qwen/Qwen2.5-VL-3B-Instruct
	---
	# Model Card for Model ID

	## summary
	<!-- Provide a quick summary of what the model is/does. -->

	This is a finetuned version of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), focusing on the task img2latex.

	The model is finetuned on the dataset [OleehyO/latex-formulas](https://huggingface.com/datasets/OleehyO/latex-formulas) with 2 epochs to enhance latex ocr capability,
	and one epoch on [linxy/LaTeX-OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR) to regulate the model's output.

	This work is inspired by [prithivMLmods/Qwen2-VL-OCR-2B-Instruct](https://huggingface.co/prithivMLmods/Qwen2-VL-OCR-2B-Instruct).

	## evaluation


	\| model \| metric \| value \|
	\|-----------------------------------------------\|-------------------\|-------\|
	\| prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) \| rouge-l: f1-score \| 0.88 \|
	\| \| CER \| 0.24 \|
	\| etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) \| rouge-l: f1-score \| 0.91 \|
	\| \| CER \| 0.21 \|
	\| \| \| \|

	The improvement probably comes from:

	1. model's upgrade, for sure...?
	2. larger dataset: 100K -> 550K

	There is an even MUCH larger dataset [OleehyO/latex-formulas-80M](https://huggingface.co/datasets/OleehyO/latex-formulas-80M), but my computing resources are limited.