QTuneVL1.5-2B developed by the Reconova AI Lab && BDAA-Lab

Introduction

We’re excited to introduce QTuneVL1.5-2B, the latest in Reconova AI Lab’s series of multimodal large language models. Building on QTuneVL1-2B, it incorporates key features from both InternVL and Mini-Monkey to deliver even greater performance.

Like QTuneVL1-2B, QTuneVL1.5-2B is a lightweight MLLM that incorporates cropping and padding strategies from Mini-Monkey/Ureader/InternVL, and has been fine-tuned on InternVL3-2B.

Evaluation

By evaluating our model on eight benchmarks in the OpenCompass leaderboard using VLMEvalKit, we found that it outperformed its predecessor(QTuneVL1-2B) in terms of average scores, particularly on MMStar MMMU_DEV_VAL and OCRBench benchmarks. The eight benchmarks and specific experimental results are as follows:

Eight benchmark: 'MMBench_DEV_EN_V11', 'MMStar', 'MMMU_DEV_VAL', 'MathVista_MINI', 'HallusionBench', 'AI2D_TEST', 'OCRBench', 'MMVet'.

Index Model AVG MMBench_DEV_EN_V11 MMStar MMMU_DEV_VAL MathVista_MINI HallusionBench AI2D_TEST OCRBench MMVet
1 Minimonkey 54.3 71.4 50.3 35.6 46.3 38.6 74.8 802 37.2
2 InternVL2-2B 54.2 71.4 50.3 34.6 47.2 38.2 74.2 783 39.8
3 InternVL2_5-2B 59.4 74.6 53.7 40.1 49.7 42.2 74.9 802 59.5
4 InternVL3-2B 63.5 79.6 61.1 48.6 51.1 42 78.4 835 64.08
5 QTuneVL1-2B 59.7 74.9 53.9 41.5 48.8 43.0 75.2 806 59.6
6 QTuneVL1.5-2B 64.2(+4.5) 79.6(+4.7) 61.4(+7.5) 51.1(+9.6) 51.8(+3) 43.0 78.8(+3.6) 858(+52) 62.1(+2.5)

It is important to note that when using VLMEvalKit for evaluation, the GPT-related evaluation models being called differ slightly from the official ones. In the code (vlmeval/dataset/utils/judge_util.py), it uses:

  • 'gpt-4o-mini': 'gpt-4o-mini' instead of 'gpt-4o-mini': 'gpt-4o-mini-2024-07-18'
  • 'gpt-4-turbo': 'gpt-4-turbo' instead of `'gpt-4-turbo': 'gpt-4-1106-preview'

This configuration will result in evaluation results that slightly differ from the official ones.

Copyright

We welcome suggestions to help us improve the QTuneVL. For any query, please contact HanChao Wang: [email protected]. If you find something interesting, please also feel free to share with us through email or open an issue.

Downloads last month
247
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for hanchaow/QTuneVL1_5-2B