Jiaheng Liu's picture

Jiaheng Liu

CheeryLJH

·

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago

NJU-LINK/IF-VidCap

liked a dataset 2 days ago

NJU-LINK/ViDiC-1K

upvoted a paper 4 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

View all activity

Organizations

upvoted a paper 4 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published 6 days ago • 36

upvoted a paper 7 days ago

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Paper • 2512.04987 • Published 7 days ago • 71

upvoted a paper 8 days ago

ViDiC: Video Difference Captioning

Paper • 2512.03405 • Published 9 days ago • 26

upvoted a paper 9 days ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published 18 days ago • 256

upvoted a paper 10 days ago

How Far Are We from Genuinely Useful Deep Research Agents?

Paper • 2512.01948 • Published 10 days ago • 52

upvoted a paper 11 days ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published 14 days ago • 183

upvoted a paper 14 days ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published 15 days ago • 15

upvoted 3 papers about 1 month ago

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Paper • 2511.07250 • Published Nov 10 • 17

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29 • 220

upvoted 10 papers about 2 months ago

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published Oct 20 • 19

IF-VidCap: Can Video Caption Models Follow Instructions?

Paper • 2510.18726 • Published Oct 21 • 24

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21 • 36

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20 • 69

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

Paper • 2510.14763 • Published Oct 16 • 13

VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning

Paper • 2510.10518 • Published Oct 12 • 18

A Survey of Vibe Coding with Large Language Models

Paper • 2510.12399 • Published Oct 14 • 48

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Paper • 2510.10666 • Published Oct 12 • 27

ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems

Paper • 2510.11652 • Published Oct 13 • 28