metadata
base_model: Qwen/Qwen2.5-VL-3B-Instruct
library_name: transformers
model_name: neg_aware_qwen
tags:
- generated_from_trainer
- sft
- trl
- vision_reward
licence: license
license: mit
datasets:
- zai-org/VisionRewardDB-Image
Model Card for neg_aware_qwen
This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct. It has been trained using TRL on zai-org/VisionRewardDB-Image. It is aimming for evaluating image's quality similar to VisionReward but with a much smaller and faster model.
Quick start
To use this model, you can just ask the model on an image using one of the dim in VisionReward.
from datasets import load_dataset
# load dataset, you can also use any images
train_ds = load_dataset("zai-org/VisionRewardDB-Image", split='train[:40000]')
test_ds = load_dataset("zai-org/VisionRewardDB-Image", split='train[40000:]')
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="weathon/qwen_2_5_vision_reward")
from transformers import pipeline
import pandas as pd
df = pd.read_csv("rules.csv")
import pandas as pd
import re
from PIL import Image
df.columns = df.columns.str.strip()
df['Dimension'] = df['Dimension'].ffill()
df['dim_key'] = df['Dimension'].apply(lambda x: re.search(r'\((.*?)\)', x).group(1) if re.search(r'\((.*?)\)', x) else x)
guide = {
dim_key: {
int(row['Score']): str(row['Description']).strip()
for _, row in group.iterrows()
}
for dim_key, group in df.groupby('dim_key')
}
question = f"You need to rate the quality of an image, guideline: {guide}."
import json
def rate(image):
messages = [
{
"role": "system",
"content": [{"type": "text", "text": question}],
},
{
"role": "user",
"content": [
{
"type": "image",
"image": image.resize((512, 512)),
}
],
}]
gen = pipe(text=messages, return_full_text=False)
return sum(json.loads(gen[0]["generated_text"].replace("'", '"')).values())
rate(test_ds[3]["image"])
sum(test_ds[3]["annotation"].values())
Training procedure
This model was trained with SFT.
Framework versions
- TRL: 0.24.0.dev0
- Transformers: 4.56.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citations
We are still working on the Paper, please keep an eye on the update.