Sally's picture

59 15

Sally

CArriy

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

Kwai Keye-VL 1.5 Technical Report

upvoted a paper 8 days ago

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

upvoted a paper 8 days ago

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

View all activity

Organizations

None yet

upvoted 16 papers 8 days ago

Kwai Keye-VL 1.5 Technical Report

Paper • 2509.01563 • Published Sep 1 • 37

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Paper • 2509.09680 • Published Sep 11 • 43

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28 • 46

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Paper • 2509.09174 • Published Sep 11 • 60

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 70

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 75

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 75

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10 • 49

BitNet Distillation

Paper • 2510.13998 • Published Oct 15 • 54

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22 • 114

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30 • 115

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9 • 125

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Paper • 2510.05684 • Published Oct 7 • 141

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27 • 174

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Paper • 2511.20614 • Published 15 days ago • 38

upvoted a paper about 2 months ago

TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model

Paper • 2510.16449 • Published Oct 18 • 34

upvoted a paper 2 months ago

LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1 • 108

upvoted 2 papers 10 months ago

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13 • 191

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Paper • 2311.02772 • Published Nov 5, 2023 • 8