-
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 84 -
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Paper • 2410.13863 • Published • 38 -
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Paper • 2408.15914 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2410.13863
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 6 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 4
-
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Paper • 2410.13863 • Published • 38 -
BAAI/Emu3-Chat
Text Generation • 8B • Updated • 613 • 73 -
deepseek-ai/Janus-1.3B
Any-to-Any • 2B • Updated • 9.85k • 592 -
facebook/chameleon-7b
Image-Text-to-Text • 7B • Updated • 59.8k • 194
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 59 -
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 27 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper • 2312.00845 • Published • 39
-
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 84 -
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Paper • 2410.13863 • Published • 38 -
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Paper • 2408.15914 • Published • 24
-
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Paper • 2410.13863 • Published • 38 -
BAAI/Emu3-Chat
Text Generation • 8B • Updated • 613 • 73 -
deepseek-ai/Janus-1.3B
Any-to-Any • 2B • Updated • 9.85k • 592 -
facebook/chameleon-7b
Image-Text-to-Text • 7B • Updated • 59.8k • 194
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 59 -
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 27 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper • 2312.00845 • Published • 39
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 6 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 4