Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
andreapie 's Collections
LLM Inference
LLM Training

LLM Training

updated Feb 17
Upvote
-

  • Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

    Paper • 2501.09686 • Published Jan 16 • 41

  • Optimizing Large Language Model Training Using FP4 Quantization

    Paper • 2501.17116 • Published Jan 28 • 37

  • Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

    Paper • 2502.02508 • Published Feb 4 • 23

  • On Teacher Hacking in Language Model Distillation

    Paper • 2502.02671 • Published Feb 4 • 18

  • Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

    Paper • 2502.03275 • Published Feb 5 • 18

  • Demystifying Long Chain-of-Thought Reasoning in LLMs

    Paper • 2502.03373 • Published Feb 5 • 58

  • ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

    Paper • 2502.04306 • Published Feb 6 • 20

  • BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

    Paper • 2502.03860 • Published Feb 6 • 25
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs