VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression? Paper • 2512.15649 • Published 9 days ago • 6
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 18 days ago • 35
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality Paper • 2505.18227 • Published May 23 • 15
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark Paper • 2508.07307 • Published Aug 10 • 1
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation Paper • 2502.17159 • Published Feb 24 • 2