fix: vLLM tool calling - enable by default with hermes parser 7239fe3 jeanbaptdzd commited on 17 days ago
Fix device placement for tokenizer outputs before model inference 64c014e jeanbaptdzd commited on 21 days ago
Refactor: Address code shortcomings and align with HF best practices dc14519 jeanbaptdzd commited on 21 days ago
Set temperature=0 for JSON format output (greedy decoding) 78ed4ff jeanbaptdzd commited on 24 days ago
Fix temperature modification: only apply to JSON format, not tools c898602 jeanbaptdzd commited on 24 days ago
Improve structured output: lower temperature for JSON/tool calls, remove unused stopping criteria 90a906d jeanbaptdzd commited on 24 days ago
Strengthen prompts with examples for tool calls and JSON format cb7f3d3 jeanbaptdzd commited on 24 days ago
Improve tool call parsing: handle reasoning tags and extract JSON tool calls 4a04968 jeanbaptdzd commited on 24 days ago
Fix reasoning tag handling: better support for unclosed <think> tags a5e663f jeanbaptdzd commited on 24 days ago
Strengthen JSON format instructions: more explicit and in English d39e295 jeanbaptdzd commited on 24 days ago
Fix reasoning tag regex to match both <think> and <think> tags 875263b jeanbaptdzd commited on 24 days ago
Simplify reasoning tag removal: use single pattern for both tag types b9ca306 jeanbaptdzd commited on 24 days ago
Fix reasoning tag patterns: handle <think> and <think> correctly 5a4f1e9 jeanbaptdzd commited on 24 days ago
Fix reasoning tag handling: support both <think> and <think> 28af6d2 jeanbaptdzd commited on 24 days ago
Fix OpenAI API compatibility: support tool_choice='required' and response_format a82e45b jeanbaptdzd commited on 24 days ago
Add deprecation warning for clear_gpu_memory model/tokenizer parameters 92bb437 jeanbaptdzd commited on 26 days ago
feat: Add rate limiting, stats tracking, and fix critical issues 67befa7 jeanbaptdzd commited on 26 days ago
refactor: Enhance codebase with comprehensive improvements for CodeRabbit review 1e23279 jeanbaptdzd commited on Nov 3
refactor: Clean up codebase - remove obsolete files and improve documentation 6541672 jeanbaptdzd commited on Nov 2
Fix critical bugs: OOM errors, race conditions, truncation, and French language support 5ac5a91 jeanbaptdzd commited on Nov 2
Add GPU memory cleanup and fix OOM errors - cleanup cache after each inference d31f411 jeanbaptdzd commited on Nov 2
Fix generation: increase tokens for complete answers, add EOS handling 78f67d6 jeanbaptdzd commited on Nov 2
Rename vllm.py to transformers_provider.py - clarify implementation and force rebuild afd6869 jeanbaptdzd commited on Nov 2
Update to vLLM 0.9.2 with Qwen3 support, remove PRIIPS functionality, add HF Space validation hook a750766 jeanbaptdzd commited on Nov 2