Commit History

fix: vLLM tool calling - enable by default with hermes parser
7239fe3

jeanbaptdzd commited on

Align HF Space API response format with vLLM
6393558

jeanbaptdzd commited on

Improve reasoning tag removal for unclosed tags
8c38d11

jeanbaptdzd commited on

Add reasoning tag removal for all chat responses
b1e1444

jeanbaptdzd commited on

Fix dict access for inputs after device placement
4bee8ff

jeanbaptdzd commited on

Fix device placement for tokenizer outputs before model inference
64c014e

jeanbaptdzd commited on

Add error handling for invalid log level configuration
7ee7723

jeanbaptdzd commited on

Refactor: Address code shortcomings and align with HF best practices
dc14519

jeanbaptdzd commited on

Remove chat_service.py abstraction layer
c77ec91

jeanbaptdzd commited on

Set temperature=0 for JSON format output (greedy decoding)
78ed4ff

jeanbaptdzd commited on

Fix temperature modification: only apply to JSON format, not tools
c898602

jeanbaptdzd commited on

Improve structured output: lower temperature for JSON/tool calls, remove unused stopping criteria
90a906d

jeanbaptdzd commited on

Strengthen prompts with examples for tool calls and JSON format
cb7f3d3

jeanbaptdzd commited on

Improve tool call parsing: handle reasoning tags and extract JSON tool calls
4a04968

jeanbaptdzd commited on

Fix reasoning tag: use <think> instead of <think>
d730034

jeanbaptdzd commited on

Fix reasoning tag handling: better support for unclosed <think> tags
a5e663f

jeanbaptdzd commited on

Strengthen JSON format instructions: more explicit and in English
d39e295

jeanbaptdzd commited on

Fix reasoning tag: use correct <think> tag pattern
682f9cd

jeanbaptdzd commited on

Fix reasoning tag regex to match both <think> and <think> tags
875263b

jeanbaptdzd commited on

Simplify reasoning tag removal: use single pattern for both tag types
b9ca306

jeanbaptdzd commited on

Fix reasoning tag patterns: handle <think> and <think> correctly
5a4f1e9

jeanbaptdzd commited on

Fix reasoning tag handling: support both <think> and <think>
28af6d2

jeanbaptdzd commited on

Fix JSON extraction to handle reasoning tags
ad2ecea

jeanbaptdzd commited on

Fix OpenAI API compatibility: support tool_choice='required' and response_format
a82e45b

jeanbaptdzd commited on

Add deprecation warning for clear_gpu_memory model/tokenizer parameters
92bb437

jeanbaptdzd commited on

Fix model ID and improve memory management
9db586c

jeanbaptdzd commited on

Merge feat/tool-enabling into master - resolve conflicts
192844a

jeanbaptdzd commited on

feat: Enable tool calls support in OpenAI API
895a63f

jeanbaptdzd commited on

feat: Add rate limiting, stats tracking, and fix critical issues
67befa7

jeanbaptdzd commited on

refactor: Enhance codebase with comprehensive improvements for CodeRabbit review
1e23279

jeanbaptdzd commited on

refactor: Improve type hints and code quality across codebase
20548ac

jeanbaptdzd commited on

fix: Apply CodeRabbit suggestions
fdc8bbe

jeanbaptdzd commited on

feat: Add input validation and type hints
f28306b

jeanbaptdzd commited on

Increase max_tokens to 1000 and request concise answers
83ffe61

jeanbaptdzd commited on

Set DEFAULT_MAX_TOKENS=800 to prevent timeouts
bedfb0c

jeanbaptdzd commited on

refactor: DRY improvements and optimize Dockerfile
16c2a22

jeanbaptdzd commited on

refactor: Clean up codebase - remove obsolete files and improve documentation
6541672

jeanbaptdzd commited on

Show complete answers in quiz + increase max_tokens to 1500
33a2ae7

jeanbaptdzd commited on

Fix truncation: increase max_tokens and proper finish_reason
9f2572d

jeanbaptdzd commited on

Add debug endpoint to inspect prompt generation
15ee2a4

jeanbaptdzd commited on

Add detailed logging for chat template debugging
ef2ab5b

jeanbaptdzd commited on

Load custom chat_template.jinja from model repository
d711c35

jeanbaptdzd commited on

Add proper Qwen3 chat template to finance model
27930d6

jeanbaptdzd commited on

Fix critical bugs: OOM errors, race conditions, truncation, and French language support
5ac5a91

jeanbaptdzd commited on

Add GPU memory cleanup and fix OOM errors - cleanup cache after each inference
d31f411

jeanbaptdzd commited on

Fix generation: increase tokens for complete answers, add EOS handling
78f67d6

jeanbaptdzd commited on

Rename vllm.py to transformers_provider.py - clarify implementation and force rebuild
afd6869

jeanbaptdzd commited on

Migrate from vLLM to Transformers library
9c71bb7

jeanbaptdzd commited on

Upgrade vLLM to 0.11.0 for Qwen3ForCausalLM support
dc80161

jeanbaptdzd commited on

Update to vLLM 0.9.2 with Qwen3 support, remove PRIIPS functionality, add HF Space validation hook
a750766

jeanbaptdzd commited on