-
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 48 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper • 2411.15124 • Published • 67 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2412.05939
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 47 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 46 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Paper • 2412.07147 • Published • 5 -
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Paper • 2412.04429 • Published -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
Paper • 2412.08737 • Published • 54
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 86 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47
-
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 48 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper • 2411.15124 • Published • 67 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 19
-
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Paper • 2412.07147 • Published • 5 -
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Paper • 2412.04429 • Published -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
Paper • 2412.08737 • Published • 54
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 47 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 46 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 86 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23