WorldCache: Content-Aware Caching for Accelerated Video World Models
Abstract
WorldCache improves diffusion transformer inference by adaptively reusing features through motion-adaptive thresholds and saliency-weighted drift estimation, achieving faster processing with minimal quality loss.
Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose WorldCache, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves 2.3times inference speedup while preserving 99.4\% of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on https://umair1221.github.io/World-Cache/{World-Cache}.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers (2026)
- WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching (2026)
- AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers (2026)
- SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models (2026)
- AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation (2026)
- SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching (2026)
- 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper