MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
Paper
•
2512.06628
•
Published
•
12
None defined yet.
Distribution Matching Variational AutoEncoder
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition