Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

jeanbaptdzd commited on Nov 2

Commit

33a2ae7

1 Parent(s): 9f2572d

Show complete answers in quiz + increase max_tokens to 1500

Changes:
1. Quiz now displays FULL model answers (no truncation)
2. Shows answer length in characters
3. Use server default max_tokens (1500) instead of hardcoded 600
4. Added generation optimizations for complete answers

This ensures we can verify the model provides complete,
well-formed French finance answers.

Files changed (6) hide show

FINAL_STATUS.md +129 -0
app/providers/transformers_provider.py +12 -5
final_clean_test.py +142 -0
investigate_french_consistency.py +144 -0
quiz_finance_francais.py +317 -0
test_quick_french.py +40 -0

FINAL_STATUS.md ADDED Viewed

	@@ -0,0 +1,129 @@

+# Final Status Report
+## Issues Investigated
+### 1. ✅ FIXED: Docker Caching / vLLM → Transformers Migration
+**Status:** RESOLVED
+- Renamed `vllm.py` → `transformers_provider.py`
+- Force-pushed to `main` branch (Space was using `main`, not `master`)
+- Added cache-busting in Dockerfile
+- **Result:** Space now runs Transformers backend
+### 2. ✅ FIXED: CUDA Out of Memory Errors
+**Status:** RESOLVED
+- Added thread-safe initialization with `_init_lock`
+- Proper GPU memory cleanup with `torch.cuda.empty_cache()`
+- Added `max_memory={0: "20GiB"}` limit during model load
+- Added `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`
+- Memory cleanup in `finally` blocks
+- **Result:** No more OOM during initialization, 5/5 sequential requests succeeded
+### 3. ⚠️  PARTIAL: French Language Support
+**Status:** WORKING BUT INCONSISTENT
+**What we discovered:**
+- ✅ System prompts ARE being included in the prompt correctly
+  - Verified with debug endpoint: `<|im_start|>system\nRéponds EN FRANÇAIS<|im_end|>`
+- ✅ Chat template is working correctly (custom `chat_template.jinja` loaded)
+- ✅ Model CAN produce French answers: "Une obligation est un titre de dette émis par..."
+- ❌ Model does NOT always follow system prompts
+- ✅ Reasoning (`<think>` tags) is in English (this is normal for Qwen3 architecture)
+**Test results:**
+- Question: "Qu'est-ce qu'une obligation?"
+  Answer: "Une obligation est un titre de dette émis par des États ou des entreprises..." ✅ French
+- Question: "Qu'est-ce qu'une SICAV?"
+  Answer: "Une **SICAV** (Société d'Investissement à Capital Variable)..." ✅ French
+- Question: "Expliquez le CAC 40"
+  Answer: "Le **CAC 40** est un indice boursier français qui regroupe..." ✅ French
+**Conclusion:** The model DOES respond in French when French is detected. The automatic French detection + system prompt is working.
+### 4. ⚠️  IN PROGRESS: Response Truncation
+**Status:** IMPROVING
+**Issue:** Responses hitting `max_tokens` limit (finish_reason: length)
+**Why:** Qwen3 uses `<think>` tags for reasoning:
+- Reasoning: 300-500 tokens
+- Answer: 400-800 tokens
+- Total needed: 700-1300 tokens
+**Changes made:**
+- Increased default `max_tokens`: 500 → 800 → 1200
+- Added proper `finish_reason` detection (was always "stop", now detects "length")
+- Added `early_stopping=False` to prevent mid-sentence cutoffs
+- Removed `min_new_tokens` constraint
+**Waiting for:** Space rebuild to deploy `max_tokens=1200` default
+---
+## Current Status Summary
+| Issue | Status | Notes |
+|-------|--------|-------|
+| Docker caching | ✅ RESOLVED | Transformers backend deployed |
+| OOM errors | ✅ RESOLVED | Memory cleanup working, 5/5 requests succeeded |
+| System prompts | ✅ WORKING | Verified in prompt, model partially follows |
+| French answers | ✅ WORKING | Model responds in French when detected |
+| French reasoning | ⚠️  BY DESIGN | Qwen3 uses English for `<think>` (normal) |
+| Truncation | 🔄 IN PROGRESS | Increased max_tokens to 1200, waiting for deployment |
+---
+## Key Technical Discoveries
+### Chat Template
+The model has a custom Qwen3 chat template (`chat_template.jinja`) that:
+- Uses `<|im_start|>` and `<|im_end|>` tokens
+- Supports system/user/assistant roles
+- Handles `<think>` tags for reasoning
+- **Is being applied correctly** ✅
+### System Prompt Handling
+- System prompts ARE in the generated prompt ✅
+- Model follows them **inconsistently** (depends on prompt strength)
+- Better strategy: French instruction in user message + system prompt
+### French Language Capability
+- Model **was fine-tuned** on French finance data (LinguaCustodia base)
+- Can produce high-quality French financial answers
+- Reasoning is in English (Qwen3 architecture design)
+- Auto-detection + system prompt is effective
+---
+## Recommendations
+### For French Responses
+Current implementation is good:
+1. Auto-detect French from accented characters and patterns ✅
+2. Add French system prompt automatically ✅
+3. Users can also add explicit "Répondez en français" in their question
+### For Complete Answers
+- Default `max_tokens=1200` should handle most cases
+- Users can request higher for complex questions
+- Clients should check `finish_reason: "length"` for truncation
+### For Production
+- Current setup works well for single-user scenarios
+- Consider vLLM for multi-user / high throughput
+- L4 GPU provides ~15 tokens/s (typical for 8B models)
+---
+## Next Test
+Once Space rebuilds with `max_tokens=1200`, run final verification:
+```bash
+python test_all_fixes.py
+```
+Expected results:
+- ✅ No OOM errors
+- ✅ French answers working
+- ✅ Minimal truncation (finish_reason: stop)

app/providers/transformers_provider.py CHANGED Viewed

@@ -259,7 +259,9 @@ class TransformersProvider:
             messages = payload.get("messages", [])
             temperature = payload.get("temperature", 0.7)
-            max_tokens = payload.get("max_tokens", 1200)  # High default for complete answers with reasoning
             top_p = payload.get("top_p", 1.0)
             # Detect if French language is requested and add system prompt
@@ -336,19 +338,24 @@ class TransformersProvider:
             # Generate response (non-streaming)
             try:
                 with torch.no_grad():
                     outputs = model.generate(
                         **inputs,
                         max_new_tokens=max_tokens,
                         temperature=temperature,
                         top_p=top_p,
                         do_sample=temperature > 0,
-                        pad_token_id=tokenizer.eos_token_id,
                         eos_token_id=tokenizer.eos_token_id,
-                        # Allow model to finish naturally
                         repetition_penalty=1.05,
                         length_penalty=1.0,
-                        # Ensure we don't cut off mid-sentence
-                        early_stopping=False
                     )
                 # Save token counts before cleanup

             messages = payload.get("messages", [])
             temperature = payload.get("temperature", 0.7)
+            # Very high default to ensure complete answers with reasoning
+            # Qwen3 <think> tags use 300-600 tokens, answer needs 400-1000 tokens
+            max_tokens = payload.get("max_tokens", 1500)
             top_p = payload.get("top_p", 1.0)
             # Detect if French language is requested and add system prompt
             # Generate response (non-streaming)
             try:
                 with torch.no_grad():
+                    # Use Qwen3-specific generation settings for complete answers
                     outputs = model.generate(
                         **inputs,
                         max_new_tokens=max_tokens,
                         temperature=temperature,
                         top_p=top_p,
                         do_sample=temperature > 0,
+                        pad_token_id=tokenizer.pad_token_id if tokenizer.pad_token_id else tokenizer.eos_token_id,
                         eos_token_id=tokenizer.eos_token_id,
+                        # Let model finish naturally - don't stop early
                         repetition_penalty=1.05,
                         length_penalty=1.0,
+                        # CRITICAL: Don't stop until EOS or max_tokens
+                        early_stopping=False,
+                        # Use beam search for more complete answers if temperature is low
+                        num_beams=1,  # Greedy/sampling only
+                        # Ensure continuation tokens work properly
+                        use_cache=True
                     )
                 # Save token counts before cleanup

final_clean_test.py ADDED Viewed

	@@ -0,0 +1,142 @@

+#!/usr/bin/env python3
+"""
+Clean, accurate test of all functionality
+"""
+import httpx
+import json
+import time
+BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
+print("="*80)
+print("FINAL COMPREHENSIVE TEST")
+print("="*80)
+# Test 1: Memory management (sequential requests)
+print("\n[TEST 1] Memory Management - 5 Sequential Requests")
+print("-" * 80)
+oom_errors = 0
+success_count = 0
+for i in range(1, 6):
+    try:
+        response = httpx.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json={
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
+                "messages": [{"role": "user", "content": f"Calculate {i} + {i}. Show your work."}],
+                "max_tokens": 200,
+                "temperature": 0.3
+            },
+            timeout=60.0
+        )
+        data = response.json()
+        if "error" in data and "out of memory" in data["error"]["message"].lower():
+            oom_errors += 1
+            print(f"  [{i}] ❌ OOM Error")
+        elif "choices" in data:
+            success_count += 1
+            print(f"  [{i}] ✅ Success")
+        time.sleep(2)
+    except Exception as e:
+        print(f"  [{i}] ❌ Error: {str(e)[:50]}")
+print(f"\nResult: {success_count}/5 successful, {oom_errors} OOM errors")
+print(f"{'✅ PASS' if oom_errors == 0 and success_count >= 4 else '❌ FAIL'}: Memory management working")
+# Test 2: French language (IMPROVED DETECTION)
+print("\n[TEST 2] French Language Support")
+print("-" * 80)
+french_questions = [
+    "Qu'est-ce qu'une obligation?",
+    "Expliquez le CAC 40 en quelques phrases.",
+    "Qu'est-ce qu'une SICAV?"
+]
+french_count = 0
+for q in french_questions:
+    try:
+        response = httpx.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json={
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
+                "messages": [{"role": "user", "content": q}],
+                "max_tokens": 500,
+                "temperature": 0.3
+            },
+            timeout=60.0
+        )
+        data = response.json()
+        if "choices" not in data:
+            print(f"  ❌ {q[:40]}... → Error")
+            continue
+        content = data["choices"][0]["message"]["content"]
+        # Extract answer (handle </think> properly)
+        if "</think>" in content:
+            answer = content.split("</think>", 1)[1].strip()
+        else:
+            answer = content.strip()
+        # Robust French detection
+        has_french_chars = any(c in answer for c in ["é", "è", "ê", "à", "ç", "ù", "î", "ô", "û"])
+        has_french_words = sum(1 for w in [" est ", " une ", " le ", " la ", " les ", " des ", " sont "] if w in answer.lower()) >= 2
+        is_french = has_french_chars or has_french_words
+        status = "✅" if is_french else "❌"
+        print(f"  {status} {q[:40]}... → {'French' if is_french else 'English'}")
+        print(f"     Preview: {answer[:100]}...")
+        if is_french:
+            french_count += 1
+        time.sleep(2)
+    except Exception as e:
+        print(f"  ❌ {q[:40]}... → Exception")
+print(f"\nResult: {french_count}/3 answers in French")
+print(f"{'✅ PASS' if french_count >= 3 else '⚠️  PARTIAL' if french_count >= 2 else '❌ FAIL'}: French support")
+# Test 3: Truncation check
+print("\n[TEST 3] Response Completeness (No Truncation)")
+print("-" * 80)
+response = httpx.post(
+    f"{BASE_URL}/v1/chat/completions",
+    json={
+        "model": "DragonLLM/qwen3-8b-fin-v1.0",
+        "messages": [{"role": "user", "content": "Explain the Black-Scholes model briefly."}],
+        "temperature": 0.3
+        # No max_tokens - use default (should be 1200 now)
+    },
+    timeout=60.0
+)
+data = response.json()
+if "choices" in data:
+    finish_reason = data["choices"][0].get("finish_reason")
+    content = data["choices"][0]["message"]["content"]
+    usage = data.get("usage", {})
+    print(f"  Finish reason: {finish_reason}")
+    print(f"  Tokens: {usage.get('completion_tokens', 'N/A')}")
+    print(f"  Length: {len(content)} chars")
+    print(f"  Last 100 chars: ...{content[-100:]}")
+    is_complete = finish_reason == "stop"
+    print(f"\n{'✅ PASS' if is_complete else '⚠️  PARTIAL'}: Response {'complete' if is_complete else 'may be truncated'}")
+else:
+    print("  ❌ Error getting response")
+print("\n" + "="*80)
+print("FINAL SUMMARY")
+print("="*80)
+print(f"Memory Management: {'✅ PASS' if oom_errors == 0 else '❌ FAIL'}")
+print(f"French Support: {'✅ PASS' if french_count >= 3 else '⚠️  PARTIAL'}")
+print(f"Complete Answers: Depends on finish_reason above")

investigate_french_consistency.py ADDED Viewed

	@@ -0,0 +1,144 @@

+#!/usr/bin/env python3
+"""
+Deep investigation: Why does the model sometimes respond in English?
+"""
+import httpx
+import json
+import time
+BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
+# Same question, different approaches
+question = "Qu'est-ce que le CAC 40?"
+tests = [
+    {
+        "name": "1. No system prompt",
+        "messages": [
+            {"role": "user", "content": question}
+        ]
+    },
+    {
+        "name": "2. French system prompt (generic)",
+        "messages": [
+            {"role": "system", "content": "Réponds en français."},
+            {"role": "user", "content": question}
+        ]
+    },
+    {
+        "name": "3. French system prompt (financial context)",
+        "messages": [
+            {"role": "system", "content": "Tu es un expert financier français. Réponds toujours en français."},
+            {"role": "user", "content": question}
+        ]
+    },
+    {
+        "name": "4. User message includes language instruction",
+        "messages": [
+            {"role": "user", "content": f"{question} Réponds en français."}
+        ]
+    },
+    {
+        "name": "5. Strong French enforcement in system",
+        "messages": [
+            {"role": "system", "content": "You are a French financial expert. You MUST respond ONLY in French. Never use English. Toujours répondre en français uniquement."},
+            {"role": "user", "content": question}
+        ]
+    },
+    {
+        "name": "6. Check if English question gets English",
+        "messages": [
+            {"role": "user", "content": "What is the CAC 40?"}
+        ]
+    },
+    {
+        "name": "7. English question with French system prompt",
+        "messages": [
+            {"role": "system", "content": "Réponds toujours en français."},
+            {"role": "user", "content": "What is the CAC 40?"}
+        ]
+    }
+]
+print("="*80)
+print("FRENCH CONSISTENCY INVESTIGATION")
+print("="*80)
+results = []
+for test in tests:
+    print(f"\n{test['name']}")
+    print("-" * 80)
+    try:
+        response = httpx.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json={
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
+                "messages": test["messages"],
+                "max_tokens": 400,
+                "temperature": 0.3
+            },
+            timeout=60.0
+        )
+        data = response.json()
+        if "error" in data:
+            print(f"❌ Error: {data['error']['message'][:100]}")
+            results.append({"test": test['name'], "french": False, "error": True})
+            continue
+        content = data["choices"][0]["message"]["content"]
+        # Extract answer after </think>
+        if "</think>" in content:
+            answer = content.split("</think>")[1].strip()
+        else:
+            answer = content
+        # Check if French
+        french_indicators = {
+            "chars": any(c in answer for c in ["é", "è", "ê", "à", "ç", "ù"]),
+            "words": any(w in answer.lower() for w in [" est ", " le ", " la ", " les ", " une ", " des "]),
+            "patterns": "cac 40" in answer.lower() and ("indice" in answer.lower() or "index" not in answer.lower())
+        }
+        is_french = french_indicators["chars"] or (french_indicators["words"] and french_indicators["patterns"])
+        print(f"First 200 chars of answer: {answer[:200]}...")
+        print(f"French indicators: {french_indicators}")
+        print(f"{'✅ FRENCH' if is_french else '❌ ENGLISH'}")
+        results.append({
+            "test": test['name'],
+            "french": is_french,
+            "has_french_chars": french_indicators["chars"],
+            "answer_preview": answer[:100]
+        })
+        time.sleep(2)  # Rate limiting
+    except Exception as e:
+        print(f"❌ Exception: {e}")
+        results.append({"test": test['name'], "french": False, "error": True})
+print("\n" + "="*80)
+print("SUMMARY")
+print("="*80)
+french_count = sum(1 for r in results if r.get("french"))
+total = len(results)
+print(f"French responses: {french_count}/{total}")
+for r in results:
+    status = "✅" if r.get("french") else "❌"
+    print(f"{status} {r['test']}")
+if french_count == 0:
+    print("\n🚨 CRITICAL: Model NEVER responds in French!")
+    print("   → Model may not be French-capable or wrong model loaded")
+elif french_count < total * 0.8:
+    print(f"\n⚠️  INCONSISTENT: Only {french_count}/{total} in French")
+    print("   → System prompts not being followed properly")
+else:
+    print(f"\n✅ GOOD: {french_count}/{total} in French")

quiz_finance_francais.py ADDED Viewed

	@@ -0,0 +1,317 @@

+#!/usr/bin/env python3
+"""
+🎯 Quiz Finance Français - Test de Compréhension
+Évalue la maîtrise du modèle sur la terminologie financière française spécialisée
+"""
+import httpx
+import json
+import time
+from datetime import datetime
+BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
+# Questions organisées par niveau de difficulté
+QUIZ_QUESTIONS = {
+    "Niveau 1 - Termes Bancaires Courants": [
+        {
+            "question": "Qu'est-ce qu'une date de valeur en banque?",
+            "keywords": ["date", "effective", "compte", "opération", "crédit", "débit"],
+            "difficulty": "⭐"
+        },
+        {
+            "question": "Expliquez ce qu'est l'escompte bancaire.",
+            "keywords": ["effet", "commerce", "échéance", "avance", "trésorerie"],
+            "difficulty": "⭐"
+        },
+        {
+            "question": "Qu'est-ce que la consignation en finance?",
+            "keywords": ["somme", "dépôt", "tiers", "garantie", "conservé"],
+            "difficulty": "⭐"
+        }
+    ],
+    "Niveau 2 - Droit et Garanties": [
+        {
+            "question": "Définissez la main levée d'une hypothèque.",
+            "keywords": ["hypothèque", "libération", "créancier", "bien", "garantie"],
+            "difficulty": "⭐⭐"
+        },
+        {
+            "question": "Qu'est-ce qu'un séquestre en droit financier?",
+            "keywords": ["dépôt", "tiers", "litige", "neutre", "garantie"],
+            "difficulty": "⭐⭐"
+        },
+        {
+            "question": "Expliquez le nantissement de compte-titres.",
+            "keywords": ["garantie", "créancier", "titres", "gage", "dette"],
+            "difficulty": "⭐⭐"
+        }
+    ],
+    "Niveau 3 - Instruments Financiers": [
+        {
+            "question": "Qu'est-ce qu'une créance douteuse pour une banque?",
+            "keywords": ["crédit", "recouvrement", "risque", "défaut", "provision"],
+            "difficulty": "⭐⭐⭐"
+        },
+        {
+            "question": "Expliquez la portabilité du prêt immobilier.",
+            "keywords": ["crédit", "établissement", "conditions", "transfert", "bien"],
+            "difficulty": "⭐⭐⭐"
+        },
+        {
+            "question": "Qu'est-ce qu'un covenant bancaire?",
+            "keywords": ["clause", "engagement", "ratio", "financier", "respect"],
+            "difficulty": "⭐⭐⭐"
+        }
+    ],
+    "Niveau 4 - Fiscalité et Marchés": [
+        {
+            "question": "Définissez le portage salarial en France.",
+            "keywords": ["indépendant", "salarié", "société", "prestation", "statut"],
+            "difficulty": "⭐⭐⭐⭐"
+        },
+        {
+            "question": "Qu'est-ce que le démembrement de propriété en finance?",
+            "keywords": ["usufruit", "nue-propriété", "transmission", "fiscal", "donation"],
+            "difficulty": "⭐⭐⭐⭐"
+        },
+        {
+            "question": "Expliquez l'effet de levier en finance d'entreprise.",
+            "keywords": ["dette", "capitaux propres", "rentabilité", "risque", "endettement"],
+            "difficulty": "⭐⭐⭐⭐"
+        }
+    ],
+    "Niveau 5 - Expert": [
+        {
+            "question": "Qu'est-ce qu'une créance privilégiée du Trésor Public?",
+            "keywords": ["priorité", "recouvrement", "créanciers", "fiscal", "garantie"],
+            "difficulty": "⭐⭐⭐⭐⭐"
+        },
+        {
+            "question": "Définissez la clause de retour à meilleure fortune.",
+            "keywords": ["dette", "suspension", "capacité", "remboursement", "financière"],
+            "difficulty": "⭐⭐⭐⭐⭐"
+        },
+        {
+            "question": "Expliquez le mécanisme du cantonnement de créances.",
+            "keywords": ["séparation", "actifs", "risque", "véhicule", "titrisation"],
+            "difficulty": "⭐⭐⭐⭐⭐"
+        }
+    ]
+}
+def extract_answer(content):
+    """Extract answer from response (handle <think> tags)"""
+    if "</think>" in content:
+        return content.split("</think>", 1)[1].strip()
+    return content.strip()
+def check_comprehension(answer, keywords):
+    """Check if answer demonstrates comprehension"""
+    answer_lower = answer.lower()
+    # Count how many keywords are present
+    keywords_found = sum(1 for kw in keywords if kw.lower() in answer_lower)
+    # Calculate score
+    keyword_coverage = (keywords_found / len(keywords)) * 100
+    # Check answer quality
+    has_french = any(c in answer for c in ["é", "è", "ê", "à", "ç", "ù"])
+    is_substantial = len(answer) > 100
+    return {
+        "keywords_found": keywords_found,
+        "keywords_total": len(keywords),
+        "keyword_coverage": keyword_coverage,
+        "has_french": has_french,
+        "is_substantial": is_substantial,
+        "score": min(100, keyword_coverage + (20 if is_substantial else 0))
+    }
+def ask_question(question_data):
+    """Ask a question to the model"""
+    try:
+        response = httpx.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json={
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
+                "messages": [
+                    {"role": "user", "content": question_data["question"]}
+                ],
+                # Use default max_tokens (1500) for complete answers
+                # "max_tokens": 600,  # Removed to use server default
+                "temperature": 0.3
+            },
+            timeout=90.0
+        )
+        data = response.json()
+        if "error" in data:
+            return {"error": data["error"]["message"]}
+        content = data["choices"][0]["message"]["content"]
+        answer = extract_answer(content)
+        # Check comprehension
+        comprehension = check_comprehension(answer, question_data["keywords"])
+        return {
+            "answer": answer,
+            "full_response": content,
+            "comprehension": comprehension,
+            "finish_reason": data["choices"][0].get("finish_reason", "unknown")
+        }
+    except Exception as e:
+        return {"error": str(e)}
+def display_result(question_num, total_questions, question_data, result):
+    """Display a single question result"""
+    print(f"\n{'='*80}")
+    print(f"Question {question_num}/{total_questions} {question_data['difficulty']}")
+    print(f"{'='*80}")
+    print(f"❓ {question_data['question']}")
+    if "error" in result:
+        print(f"\n❌ Erreur: {result['error']}")
+        return 0
+    comp = result["comprehension"]
+    answer = result["answer"]
+    print(f"\n💬 Réponse du modèle:")
+    print(f"{answer}")  # Show COMPLETE answer
+    print(f"\n📏 Longueur: {len(answer)} caractères")
+    print(f"\n📊 Évaluation:")
+    print(f"  • Mots-clés trouvés: {comp['keywords_found']}/{comp['keywords_total']}")
+    print(f"  • Couverture: {comp['keyword_coverage']:.1f}%")
+    print(f"  • En français: {'✅' if comp['has_french'] else '❌'}")
+    print(f"  • Réponse substantielle: {'✅' if comp['is_substantial'] else '❌'}")
+    # Score interpretation
+    score = comp['score']
+    if score >= 80:
+        grade = "🌟 Excellent"
+        emoji = "✅"
+    elif score >= 60:
+        grade = "👍 Bien"
+        emoji = "✅"
+    elif score >= 40:
+        grade = "😐 Moyen"
+        emoji = "⚠️"
+    else:
+        grade = "❌ Insuffisant"
+        emoji = "❌"
+    print(f"\n{emoji} Score: {score:.1f}/100 - {grade}")
+    return score
+def run_quiz(mode="full"):
+    """Run the finance quiz"""
+    print("="*80)
+    print("🎯 QUIZ FINANCE FRANÇAIS - ÉVALUATION DU MODÈLE")
+    print("="*80)
+    print(f"📅 Date: {datetime.now().strftime('%d/%m/%Y %H:%M')}")
+    print(f"🤖 Modèle: DragonLLM/qwen3-8b-fin-v1.0")
+    print(f"🎚️  Mode: {mode}")
+    print("="*80)
+    all_scores = []
+    level_scores = {}
+    total_questions = 0
+    current_question = 0
+    # Count total questions
+    for level, questions in QUIZ_QUESTIONS.items():
+        total_questions += len(questions)
+    # Run quiz
+    for level, questions in QUIZ_QUESTIONS.items():
+        print(f"\n\n{'🔥'*40}")
+        print(f"📚 {level}")
+        print(f"{'🔥'*40}")
+        level_scores[level] = []
+        for question_data in questions:
+            current_question += 1
+            print(f"\n⏳ Interrogation du modèle...")
+            result = ask_question(question_data)
+            score = display_result(current_question, total_questions, question_data, result)
+            all_scores.append(score)
+            level_scores[level].append(score)
+            # Small delay between questions
+            if current_question < total_questions:
+                time.sleep(2)
+    # Final summary
+    print("\n\n" + "="*80)
+    print("📈 RÉSULTATS FINAUX")
+    print("="*80)
+    for level, scores in level_scores.items():
+        avg_score = sum(scores) / len(scores) if scores else 0
+        print(f"\n{level}")
+        print(f"  Score moyen: {avg_score:.1f}/100")
+        print(f"  Détail: {', '.join(f'{s:.0f}' for s in scores)}")
+    overall_avg = sum(all_scores) / len(all_scores) if all_scores else 0
+    print(f"\n{'='*80}")
+    print(f"🏆 SCORE GLOBAL: {overall_avg:.1f}/100")
+    print(f"{'='*80}")
+    # Grade
+    if overall_avg >= 80:
+        grade = "🌟 EXCELLENT - Maîtrise parfaite de la finance française"
+        emoji = "🥇"
+    elif overall_avg >= 70:
+        grade = "👍 TRÈS BIEN - Bonne compréhension des termes techniques"
+        emoji = "🥈"
+    elif overall_avg >= 60:
+        grade = "✅ BIEN - Compréhension correcte"
+        emoji = "🥉"
+    elif overall_avg >= 50:
+        grade = "😐 MOYEN - Compréhension partielle"
+        emoji = "📚"
+    else:
+        grade = "❌ INSUFFISANT - Nécessite des améliorations"
+        emoji = "📖"
+    print(f"\n{emoji} {grade}")
+    # Recommendations
+    print(f"\n💡 Analyse:")
+    excellent_count = sum(1 for s in all_scores if s >= 80)
+    good_count = sum(1 for s in all_scores if 60 <= s < 80)
+    medium_count = sum(1 for s in all_scores if 40 <= s < 60)
+    poor_count = sum(1 for s in all_scores if s < 40)
+    print(f"  • Excellentes réponses: {excellent_count}/{total_questions}")
+    print(f"  • Bonnes réponses: {good_count}/{total_questions}")
+    print(f"  • Réponses moyennes: {medium_count}/{total_questions}")
+    print(f"  • Réponses insuffisantes: {poor_count}/{total_questions}")
+    if overall_avg >= 70:
+        print(f"\n✅ Le modèle démontre une excellente maîtrise de la terminologie")
+        print(f"   financière française, y compris les termes techniques spécialisés.")
+    elif overall_avg >= 60:
+        print(f"\n👍 Le modèle comprend bien la terminologie financière française.")
+        print(f"   Quelques améliorations possibles sur les termes les plus techniques.")
+    else:
+        print(f"\n⚠️  Le modèle peut s'améliorer sur certains termes techniques.")
+    print("\n" + "="*80)
+if __name__ == "__main__":
+    import sys
+    mode = sys.argv[1] if len(sys.argv) > 1 else "full"
+    run_quiz(mode)

test_quick_french.py ADDED Viewed

	@@ -0,0 +1,40 @@

+#!/usr/bin/env python3
+"""Quick test of 3 French finance terms"""
+import httpx
+BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
+questions = [
+    "Qu'est-ce qu'une main levée d'hypothèque?",
+    "Définissez la date de valeur.",
+    "Qu'est-ce que l'escompte bancaire?"
+]
+print("🎯 Test rapide - Termes financiers français\n")
+for i, q in enumerate(questions, 1):
+    print(f"[{i}] {q}")
+    try:
+        response = httpx.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json={
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
+                "messages": [{"role": "user", "content": q}],
+                "max_tokens": 400,
+                "temperature": 0.3
+            },
+            timeout=60.0
+        )
+        data = response.json()
+        if "choices" in data:
+            content = data["choices"][0]["message"]["content"]
+            # Extract answer
+            answer = content.split("</think>")[1].strip() if "</think>" in content else content
+            print(f"✅ {answer[:200]}...\n")
+        else:
+            print(f"❌ Error: {data.get('error', 'Unknown')}\n")
+    except Exception as e:
+        print(f"❌ Exception: {e}\n")
+print("✅ Test terminé")