-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2511.04570
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 57 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 52 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 96 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 15 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 81
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 102 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 57
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 122 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 75 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 42 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 127
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 102 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 57
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 122 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 75 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 57 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 52 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 208 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 96 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 15 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 81
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 42 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 127