Qwen3-VL-8B GRPO RLVR checkpoints from a token-dropout exploration study. OMR ppexplore=winner (0.714); video ~0.485 dead-heat.
Nguyen Quang Trung
ngqtrung
AI & ML interests
None yet
Recent Activity
updated a collection about 20 hours ago
Qwen3-VL-8B RLVR — Models (v1) updated a collection about 20 hours ago
Qwen3-VL-8B RLVR — Models (v1) updated a collection about 20 hours ago
Qwen3-VL-8B RLVR — Models (v1)