LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Paper • 2412.15188 • Published Dec 19, 2024 • 1
MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation Paper • 2506.07999 • Published Jun 9, 2025 • 2
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published Dec 4, 2025 • 18
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors Paper • 2409.02979 • Published Sep 4, 2024 • 1
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published Apr 14, 2025 • 21
videotransfusion/sit_pretrained_model-llama3-freezetext-bz512-lr1en4-detailedcaption-init50k-v2 Updated Apr 25, 2025
videotransfusion/sit_pretrained_model-llama3-freezetext-bz512-lr1en4-detailedcaption-init50k-v2 Updated Apr 25, 2025
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Can Language Models Solve Graph Problems in Natural Language? Paper • 2305.10037 • Published May 17, 2023 • 1