Edit Models filters
Model Tree
Apps
Inference Providers
Models
611
Active filters: rlvr
zosmaai/Qwen3.5-0.8B-GRPO-Math
Text Generation • 0.8B • Updated • 12 • 1
Edmon02/mathphd-plus-plus-0.5b
Text Generation • 0.5B • Updated • 263 • • 1
nishantup/nanogpt-rlvr-slm-tinystories-124m
Text Generation • Updated • 228 • 1
SultanR/SmolTulu-1.7b-Reinforced-GGUF
Text Generation • 2B • Updated • 17 • 1
thuml/rt1-world-model-multi-step-rlvr
Updated • 2
thuml/rt1-world-model-single-step-rlvr
Updated • 3
thuml/webarena-world-model-rlvr
2B • Updated • 2
thuml/bytesized32-world-model-rlvr-binary-reward
2B • Updated • 4
thuml/bytesized32-world-model-rlvr-task-specific-reward
2B • Updated • 4
DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO
Text Generation • 8B • Updated • 7 • 1
thinkwee/NOVER1-Qwen3-4B
thinkwee/NOVER1-Qwen2.5-7B
mradermacher/NOVER1-Qwen3-4B-GGUF
4B • Updated • 180 • 1
mradermacher/NOVER1-Qwen2.5-7B-GGUF
8B • Updated • 634 • 1
mradermacher/NOVER1-Qwen3-4B-i1-GGUF
4B • Updated • 318 • 1
mradermacher/NOVER1-Qwen2.5-7B-i1-GGUF
8B • Updated • 777 • 1
DebateLabKIT/Phi-4-Argunaut-1-HIRPO
Text Generation • 415k • Updated • 6
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-GGUF
8B • Updated • 164 • 1
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-i1-GGUF
8B • Updated • 341 • 1
fangwu97/DeepSearch-1.5B
Text Generation • 2B • Updated • 17 • 9
ziadrone/airesupdated-v2
Text Generation • 4B • Updated • 7 • 1
mradermacher/airesupdated-v2-GGUF
Reinforcement Learning • 4B • Updated • 102
ABaroian/Apertus-8B-RLVR-GSM
Text Generation • Updated • 2
Anonymouslolol/qwen3-8B-hanabi-step110
Reinforcement Learning • Updated • 19
beyoru/MaxCoder-4B
Text Generation • 4B • Updated • 1
anonymousatom/IntelliAsk-Qwen3-32B-450-Merged
Text Generation • 33B • Updated • 38 •
mradermacher/IntelliAsk-Qwen3-32B-450-Merged-GGUF
Reinforcement Learning • 33B • Updated • 126
mradermacher/Phi-4-Argunaut-1-HIRPO-GGUF
15B • Updated • 238
mradermacher/Phi-4-Argunaut-1-HIRPO-i1-GGUF
15B • Updated • 545