EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model Paper • 2604.10268 • Published 14 days ago • 6
Hierarchical Codec Diffusion for Video-to-Speech Generation Paper • 2604.15923 • Published 8 days ago • 2
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 10 days ago • 110
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video Paper • 2604.07882 • Published 16 days ago • 9
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting Paper • 2604.12626 • Published 11 days ago • 14
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 12 days ago • 28
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 12 days ago • 70
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation Paper • 2604.09132 • Published 15 days ago • 52
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details Paper • 2604.06870 • Published 17 days ago • 41
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On Paper • 2604.08526 • Published 16 days ago • 20
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published 17 days ago • 39
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization Paper • 2604.04787 • Published 19 days ago • 12
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 28 days ago • 26
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data Paper • 2604.01666 • Published 23 days ago • 10
PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models Paper • 2603.28763 • Published 25 days ago • 7
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 25 days ago • 13