-
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 114 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62 -
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Paper • 2510.06308 • Published • 53
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14545
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 493 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 83
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 140 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158
-
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
dongguanting/Qwen3-8B-AEPO-DeepSearch
Text Generation • 8B • Updated • 10 • 1 -
dongguanting/Qwen3-14B-AEPO-DeepSearch
Robotics • 15B • Updated • 8 • 1 -
dongguanting/Qwen2.5-7B-AEPO
Text Generation • 8B • Updated • 22 • 1
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 114 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 671 • 155 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 17
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 114 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62 -
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Paper • 2510.06308 • Published • 53
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 493 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
dongguanting/Qwen3-8B-AEPO-DeepSearch
Text Generation • 8B • Updated • 10 • 1 -
dongguanting/Qwen3-14B-AEPO-DeepSearch
Robotics • 15B • Updated • 8 • 1 -
dongguanting/Qwen2.5-7B-AEPO
Text Generation • 8B • Updated • 22 • 1
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 114 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 671 • 155 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 17
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 83
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 140 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11