How to reduce "Think" responses when using vLLM for inference?

by rjsng0904 - opened Apr 22, 2025

Apr 22, 2025

Hi, I'm currently using vLLM for model inference, but I've noticed that a large portion of the model’s responses consists of the "Think" part. Is there a way to minimize or stop this behavior? Any suggestions would be greatly appreciated!

stelterlab

Owner Apr 23, 2025

I'm afraid I can't answer that question, but you could try to ask the model creators in their community section for the original model as this is only a quant of that model.
As I can see there were already other users with similar questions: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/discussions/18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment