How to reduce "Think" responses when using vLLM for inference?
#1
by
rjsng0904
- opened
Hi, I'm currently using vLLM for model inference, but I've noticed that a large portion of the model’s responses consists of the "Think" part. Is there a way to minimize or stop this behavior? Any suggestions would be greatly appreciated!
I'm afraid I can't answer that question, but you could try to ask the model creators in their community section for the original model as this is only a quant of that model.
As I can see there were already other users with similar questions: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/discussions/18