Spaces:
Runtime error
Runtime error
Markus Clauss DIRU Vetsuisse
Claude
commited on
Commit
·
566f51c
1
Parent(s):
ed1e41a
Optimize for CPU Enhanced performance
Browse files- Add CPU-specific optimizations for better performance
- Use all available CPU cores with torch.set_num_threads
- Enable torch.compile for CPU optimization (when available)
- Add CPU offloading for memory management
- Improve CPU status reporting with psutil
- Show CPU cores, RAM usage, and CPU load
- Add offload_folder and offload_state_dict for large model handling
- Set model to eval() mode for inference optimization
- Add psutil to requirements for system monitoring
Performance improvements:
- Faster inference on CPU
- Better memory management
- Multi-core utilization
- Real-time CPU monitoring
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
- 2025-09-11-stats-weights-selectedlayer.txt +0 -0
- README_TESTING.md +145 -0
- README_spaces.md +0 -39
- app.py +33 -8
- quick_tokenizer_test.py +136 -0
- requirements.txt +7 -1
- test_apertus_only.py +128 -0
- test_big_models_comparison.py +193 -0
- test_swiss_german_generation.py +350 -0
2025-09-11-stats-weights-selectedlayer.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README_TESTING.md
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🇨🇭 Swiss German AI Testing Scripts
|
| 2 |
+
|
| 3 |
+
Zwei Test-Scripts um die verschiedenen Modelle auf ihre Schweizerdeutsch-Fähigkeiten zu testen.
|
| 4 |
+
|
| 5 |
+
## 📋 Scripts Übersicht
|
| 6 |
+
|
| 7 |
+
### 1. `quick_tokenizer_test.py` - Schnelle Tokenizer-Analyse
|
| 8 |
+
**⚡ Schnell und lightweight**
|
| 9 |
+
- Nur Tokenizer-Loading (keine Models)
|
| 10 |
+
- Vergleicht 5+ verschiedene Tokenizer
|
| 11 |
+
- Zeigt Effizienz und Probleme
|
| 12 |
+
- Läuft auch auf CPU in ~30 Sekunden
|
| 13 |
+
|
| 14 |
+
```bash
|
| 15 |
+
python quick_tokenizer_test.py
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
### 2. `test_swiss_german_generation.py` - Vollständige Text-Generation
|
| 19 |
+
**🧠 Komplett aber ressourcenintensiv**
|
| 20 |
+
- Lädt komplette Models
|
| 21 |
+
- Echte Text-Generation
|
| 22 |
+
- Speichert Ergebnisse als JSON
|
| 23 |
+
- Braucht GPU für große Models
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
python test_swiss_german_generation.py
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
## 🎯 Was getestet wird
|
| 30 |
+
|
| 31 |
+
### Test-Texte:
|
| 32 |
+
- **Swiss German 1**: `"Grüezi! Chönd Sie mer bitte d Schwyzer KI erchläre?"`
|
| 33 |
+
- **Swiss German 2**: `"Was isch KI und wie funktioniert das?"`
|
| 34 |
+
- **Standard German**: `"Hallo! Können Sie mir bitte die Schweizer KI erklären?"`
|
| 35 |
+
- **Swiss Dialect**: `"Mir händ hüt es schöns Wätter, gäll?"`
|
| 36 |
+
- **Technical German**: `"Die Künstliche Intelligenz verwendet neuronale Netzwerke."`
|
| 37 |
+
|
| 38 |
+
### Modelle:
|
| 39 |
+
- 🇨🇭 **Apertus Swiss AI** (`swiss-ai/Apertus-8B-Instruct-2509`)
|
| 40 |
+
- 🇩🇪 **German BERT** (`bert-base-german-cased`)
|
| 41 |
+
- 🇩🇪 **German GPT-2** (`dbmdz/german-gpt2`)
|
| 42 |
+
- 🌍 **Multilingual BERT** (`bert-base-multilingual-cased`)
|
| 43 |
+
- 🤖 **Standard GPT-2** (`gpt2`)
|
| 44 |
+
|
| 45 |
+
## 📊 Was analysiert wird
|
| 46 |
+
|
| 47 |
+
### Tokenizer-Qualität:
|
| 48 |
+
- **Tokens pro Zeichen** (niedriger = effizienter)
|
| 49 |
+
- **UTF-8 Encoding Probleme** (`ü`, `ö`, `ä`)
|
| 50 |
+
- **Einzelzeichen-Tokens** (ineffizient)
|
| 51 |
+
- **Morphologie-Splits** (Compound-Behandlung)
|
| 52 |
+
|
| 53 |
+
### Text-Generation Qualität:
|
| 54 |
+
- **Schweizerdeutsch Authentizität**
|
| 55 |
+
- **Grammatikalische Korrektheit**
|
| 56 |
+
- **Kulturelle Angemessenheit**
|
| 57 |
+
- **Generierungs-Geschwindigkeit**
|
| 58 |
+
|
| 59 |
+
## 🚀 Empfohlener Ablauf
|
| 60 |
+
|
| 61 |
+
### Schritt 1: Quick Test
|
| 62 |
+
```bash
|
| 63 |
+
# Schneller Überblick über alle Tokenizer
|
| 64 |
+
python quick_tokenizer_test.py
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
### Schritt 2: Detaillierte Tests (wenn GPU verfügbar)
|
| 68 |
+
```bash
|
| 69 |
+
# Vollständige Generation-Tests
|
| 70 |
+
python test_swiss_german_generation.py
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### Schritt 3: Remote Server Test
|
| 74 |
+
```bash
|
| 75 |
+
# Auf dem Remote Server mit GPU
|
| 76 |
+
ssh apertus
|
| 77 |
+
cd /workspace/apertus-transparency-guide
|
| 78 |
+
source .venv/bin/activate
|
| 79 |
+
python test_swiss_german_generation.py
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## 📁 Output Files
|
| 83 |
+
|
| 84 |
+
### `quick_tokenizer_test.py`:
|
| 85 |
+
- Console Output mit Rankings
|
| 86 |
+
- Detaillierte Token-Aufschlüsselung
|
| 87 |
+
|
| 88 |
+
### `test_swiss_german_generation.py`:
|
| 89 |
+
- JSON File: `swiss_german_test_results_YYYYMMDD_HHMMSS.json`
|
| 90 |
+
- Enthält alle Generationen, Timings, Fehler
|
| 91 |
+
|
| 92 |
+
## 🔍 Interpretation der Ergebnisse
|
| 93 |
+
|
| 94 |
+
### Tokenizer Rankings:
|
| 95 |
+
- **Niedriger tok/char Ratio** = effizienter
|
| 96 |
+
- **Wenig "Ã" tokens** = bessere UTF-8 Behandlung
|
| 97 |
+
- **Wenig Einzelzeichen** = bessere Compound-Behandlung
|
| 98 |
+
|
| 99 |
+
### Generation Quality:
|
| 100 |
+
- **Authentisches Schweizerdeutsch** vs. Standard Deutsch
|
| 101 |
+
- **Konsistente Grammatik**
|
| 102 |
+
- **Kulturell angemessene Begriffe**
|
| 103 |
+
|
| 104 |
+
## ⚠️ Hardware Requirements
|
| 105 |
+
|
| 106 |
+
### Quick Test:
|
| 107 |
+
- ✅ CPU only
|
| 108 |
+
- ✅ 4GB RAM minimum
|
| 109 |
+
- ✅ ~2GB Download (Tokenizer)
|
| 110 |
+
|
| 111 |
+
### Full Test:
|
| 112 |
+
- 🎮 GPU empfohlen (8GB+ VRAM)
|
| 113 |
+
- 💾 16GB+ RAM
|
| 114 |
+
- 📦 ~30GB Download (alle Models)
|
| 115 |
+
|
| 116 |
+
## 🐛 Troubleshooting
|
| 117 |
+
|
| 118 |
+
### "Model zu groß" Fehler:
|
| 119 |
+
```python
|
| 120 |
+
# In test_swiss_german_generation.py, reduziere max_new_tokens:
|
| 121 |
+
max_new_tokens=50 # statt 150
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
### UTF-8 Probleme:
|
| 125 |
+
```bash
|
| 126 |
+
export PYTHONIOENCODING=utf-8
|
| 127 |
+
export LANG=en_US.UTF-8
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
### Memory Errors:
|
| 131 |
+
```python
|
| 132 |
+
# Verwende kleinere batch size oder float32 statt float16
|
| 133 |
+
torch_dtype=torch.float32
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
## 📈 Beispiel Output
|
| 137 |
+
|
| 138 |
+
```
|
| 139 |
+
🥇 German BERT : 0.324 tok/char, 35 tokens, 2 problems
|
| 140 |
+
🥈 Apertus Swiss AI : 0.315 tok/char, 34 tokens, 6 problems
|
| 141 |
+
🥉 German GPT-2 : 0.306 tok/char, 33 tokens, 9 problems
|
| 142 |
+
4. Multilingual BERT : 0.361 tok/char, 39 tokens, 3 problems
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
Das zeigt: **German BERT** ist am effizientesten mit wenigsten Problemen, aber **Apertus** ist überraschend gut bei Token-Effizienz!
|
README_spaces.md
DELETED
|
@@ -1,39 +0,0 @@
|
|
| 1 |
-
# 🇨🇭 Apertus Swiss AI Transparency Dashboard
|
| 2 |
-
|
| 3 |
-
**The world's first completely transparent language model - now with live interactive analysis!**
|
| 4 |
-
|
| 5 |
-
## What makes Apertus special?
|
| 6 |
-
|
| 7 |
-
Unlike ChatGPT, Claude, or other black-box AI systems, **Apertus is completely transparent**:
|
| 8 |
-
|
| 9 |
-
- 🧠 **See every attention pattern** - which tokens the model focuses on
|
| 10 |
-
- ⚖️ **Inspect every weight** - the actual parameters that make decisions
|
| 11 |
-
- 🎲 **View every prediction** - probabilities for every possible next word
|
| 12 |
-
- 🔍 **Track every computation** - through all 32 transformer layers
|
| 13 |
-
- 🌍 **Multilingual transparency** - works in German, French, Italian, English, Romansh
|
| 14 |
-
|
| 15 |
-
## Try it yourself!
|
| 16 |
-
|
| 17 |
-
1. **💬 Chat with Apertus** in any language
|
| 18 |
-
2. **🔍 Analyze attention patterns** - see what the model focuses on
|
| 19 |
-
3. **📊 Explore model internals** - complete transparency into AI decisions
|
| 20 |
-
|
| 21 |
-
## Model Information
|
| 22 |
-
|
| 23 |
-
- **Model**: swiss-ai/Apertus-8B-Instruct-2509 (8 billion parameters)
|
| 24 |
-
- **Languages**: German, French, Italian, English, Romansh + Swiss dialects
|
| 25 |
-
- **Context**: 65,536 tokens (extensive document support)
|
| 26 |
-
- **Training**: 15 trillion tokens on Swiss and international data
|
| 27 |
-
- **Transparency**: Every computation accessible and explainable
|
| 28 |
-
|
| 29 |
-
## Research & Development
|
| 30 |
-
|
| 31 |
-
This dashboard demonstrates the complete transparency capabilities of Swiss AI research. Unlike proprietary models, every aspect of Apertus is open and inspectable.
|
| 32 |
-
|
| 33 |
-
**Academic Use**: Approved for research and educational purposes
|
| 34 |
-
**Swiss Engineering**: Built with precision, reliability, and transparency
|
| 35 |
-
**Open Source**: Complete code available for study and extension
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
🇨🇭 **Experience true AI transparency - Swiss precision meets artificial intelligence**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -101,29 +101,50 @@ def load_model():
|
|
| 101 |
)
|
| 102 |
print(f"✅ Model loaded to GPU in {time.time() - start_time:.2f}s")
|
| 103 |
else:
|
| 104 |
-
print("💻
|
| 105 |
-
print("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
start_time = time.time()
|
| 107 |
-
# CPU-
|
| 108 |
model = AutoModelForCausalLM.from_pretrained(
|
| 109 |
model_name,
|
| 110 |
token=hf_token,
|
| 111 |
-
torch_dtype=torch.float32,
|
| 112 |
device_map="cpu",
|
| 113 |
low_cpu_mem_usage=True,
|
| 114 |
output_attentions=True,
|
| 115 |
output_hidden_states=True,
|
| 116 |
trust_remote_code=True,
|
| 117 |
-
use_safetensors=True
|
|
|
|
|
|
|
| 118 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
print(f"✅ Model loaded to CPU in {time.time() - start_time:.2f}s")
|
| 120 |
|
| 121 |
print("📊 Calculating model statistics...")
|
| 122 |
total_params = sum(p.numel() for p in model.parameters())
|
| 123 |
memory_usage = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
|
| 124 |
|
| 125 |
-
# Check
|
| 126 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
model_loaded = True
|
| 129 |
print(f"✅ MODEL LOADED SUCCESSFULLY!")
|
|
@@ -134,7 +155,11 @@ def load_model():
|
|
| 134 |
if memory_usage > 0:
|
| 135 |
return f"✅ Model loaded successfully!\n📊 Parameters: {total_params:,}\n💾 Memory: {memory_usage:.1f} GB\n🚀 Optimization: {xielu_status}"
|
| 136 |
else:
|
| 137 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
except Exception as e:
|
| 140 |
print(f"❌ ERROR loading model: {str(e)}")
|
|
|
|
| 101 |
)
|
| 102 |
print(f"✅ Model loaded to GPU in {time.time() - start_time:.2f}s")
|
| 103 |
else:
|
| 104 |
+
print("💻 CPU Enhanced Mode - Optimizing for CPU performance...")
|
| 105 |
+
print("🚀 Using CPU-specific optimizations for better performance")
|
| 106 |
+
|
| 107 |
+
# Set CPU optimization flags
|
| 108 |
+
torch.set_num_threads(os.cpu_count()) # Use all CPU cores
|
| 109 |
+
torch.set_grad_enabled(False) # Disable gradients for inference
|
| 110 |
+
|
| 111 |
start_time = time.time()
|
| 112 |
+
# CPU-optimized configuration
|
| 113 |
model = AutoModelForCausalLM.from_pretrained(
|
| 114 |
model_name,
|
| 115 |
token=hf_token,
|
| 116 |
+
torch_dtype=torch.float32, # float32 for CPU
|
| 117 |
device_map="cpu",
|
| 118 |
low_cpu_mem_usage=True,
|
| 119 |
output_attentions=True,
|
| 120 |
output_hidden_states=True,
|
| 121 |
trust_remote_code=True,
|
| 122 |
+
use_safetensors=True,
|
| 123 |
+
offload_folder="offload", # Offload to disk if needed
|
| 124 |
+
offload_state_dict=True # Offload state dict to save RAM
|
| 125 |
)
|
| 126 |
+
|
| 127 |
+
# Enable CPU optimizations
|
| 128 |
+
model.eval() # Set to evaluation mode
|
| 129 |
+
if hasattr(torch, 'compile'):
|
| 130 |
+
print("⚙️ Attempting torch.compile for CPU optimization...")
|
| 131 |
+
try:
|
| 132 |
+
model = torch.compile(model, mode="reduce-overhead")
|
| 133 |
+
print("✅ torch.compile enabled for faster CPU inference")
|
| 134 |
+
except:
|
| 135 |
+
print("⚠️ torch.compile not available, using standard mode")
|
| 136 |
print(f"✅ Model loaded to CPU in {time.time() - start_time:.2f}s")
|
| 137 |
|
| 138 |
print("📊 Calculating model statistics...")
|
| 139 |
total_params = sum(p.numel() for p in model.parameters())
|
| 140 |
memory_usage = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
|
| 141 |
|
| 142 |
+
# Check optimization status
|
| 143 |
+
if torch.cuda.is_available():
|
| 144 |
+
xielu_status = "✅ CUDA xIELU Active" if XIELU_AVAILABLE else "🎮 GPU Accelerated"
|
| 145 |
+
else:
|
| 146 |
+
cpu_count = os.cpu_count()
|
| 147 |
+
xielu_status = f"💪 CPU Enhanced ({cpu_count} cores)"
|
| 148 |
|
| 149 |
model_loaded = True
|
| 150 |
print(f"✅ MODEL LOADED SUCCESSFULLY!")
|
|
|
|
| 155 |
if memory_usage > 0:
|
| 156 |
return f"✅ Model loaded successfully!\n📊 Parameters: {total_params:,}\n💾 Memory: {memory_usage:.1f} GB\n🚀 Optimization: {xielu_status}"
|
| 157 |
else:
|
| 158 |
+
# Get CPU info
|
| 159 |
+
import psutil
|
| 160 |
+
cpu_percent = psutil.cpu_percent(interval=1)
|
| 161 |
+
ram_gb = psutil.virtual_memory().total / (1024**3)
|
| 162 |
+
return f"✅ Model loaded successfully!\n📊 Parameters: {total_params:,}\n💻 CPU Enhanced Mode\n💾 RAM: {ram_gb:.1f} GB available\n🚀 Optimization: {xielu_status}\n⚡ CPU Load: {cpu_percent:.1f}%"
|
| 163 |
|
| 164 |
except Exception as e:
|
| 165 |
print(f"❌ ERROR loading model: {str(e)}")
|
quick_tokenizer_test.py
ADDED
|
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
🔍 Quick Swiss German Tokenizer Comparison
|
| 4 |
+
Schneller Test ohne Model-Loading - nur Tokenization
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from transformers import AutoTokenizer
|
| 8 |
+
import time
|
| 9 |
+
|
| 10 |
+
def compare_tokenizers():
|
| 11 |
+
print("🇨🇭 SWISS GERMAN TOKENIZER COMPARISON")
|
| 12 |
+
print("=" * 50)
|
| 13 |
+
|
| 14 |
+
# Test texts
|
| 15 |
+
texts = {
|
| 16 |
+
"Swiss German 1": "Grüezi! Chönd Sie mer bitte d Schwyzer KI erchläre?",
|
| 17 |
+
"Swiss German 2": "Was isch KI und wie funktioniert das?",
|
| 18 |
+
"Standard German": "Hallo! Können Sie mir bitte die Schweizer KI erklären?",
|
| 19 |
+
"Swiss Dialect": "Mir händ hüt es schöns Wätter, gäll?",
|
| 20 |
+
"Technical German": "Die Künstliche Intelligenz verwendet neuronale Netzwerke."
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
# Models to compare
|
| 24 |
+
models = [
|
| 25 |
+
("🇨🇭 Apertus Swiss AI", "swiss-ai/Apertus-8B-Instruct-2509"),
|
| 26 |
+
("🇩🇪 German BERT", "bert-base-german-cased"),
|
| 27 |
+
("🇩🇪 German GPT-2", "dbmdz/german-gpt2"),
|
| 28 |
+
("🌍 Multilingual BERT", "bert-base-multilingual-cased"),
|
| 29 |
+
("🤖 Standard GPT-2", "gpt2")
|
| 30 |
+
]
|
| 31 |
+
|
| 32 |
+
print("📝 Test Texts:")
|
| 33 |
+
for name, text in texts.items():
|
| 34 |
+
print(f" {name}: {text}")
|
| 35 |
+
print()
|
| 36 |
+
|
| 37 |
+
# Compare each model
|
| 38 |
+
results = {}
|
| 39 |
+
|
| 40 |
+
for model_name, model_id in models:
|
| 41 |
+
print(f"🧠 Testing: {model_name}")
|
| 42 |
+
print("-" * 40)
|
| 43 |
+
|
| 44 |
+
try:
|
| 45 |
+
start_time = time.time()
|
| 46 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 47 |
+
load_time = time.time() - start_time
|
| 48 |
+
|
| 49 |
+
model_results = {}
|
| 50 |
+
|
| 51 |
+
for text_name, text in texts.items():
|
| 52 |
+
# Tokenize
|
| 53 |
+
tokens = tokenizer.tokenize(text)
|
| 54 |
+
token_ids = tokenizer.convert_tokens_to_ids(tokens)
|
| 55 |
+
|
| 56 |
+
# Analyze problems
|
| 57 |
+
problems = []
|
| 58 |
+
if any("Ã" in t for t in tokens):
|
| 59 |
+
problems.append("UTF-8 encoding issues")
|
| 60 |
+
single_chars = [t for t in tokens if len(t) == 1 and t.isalpha()]
|
| 61 |
+
if single_chars:
|
| 62 |
+
problems.append(f"{len(single_chars)} single character tokens")
|
| 63 |
+
|
| 64 |
+
# Calculate efficiency
|
| 65 |
+
efficiency = len(tokens) / len(text)
|
| 66 |
+
|
| 67 |
+
model_results[text_name] = {
|
| 68 |
+
"tokens": tokens,
|
| 69 |
+
"token_count": len(tokens),
|
| 70 |
+
"efficiency": efficiency,
|
| 71 |
+
"problems": problems,
|
| 72 |
+
"problematic_tokens": [t for t in tokens if "Ã" in t or (len(t) == 1 and t.isalpha())]
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
print(f" {text_name:15s}: {len(tokens):2d} tokens, {efficiency:.3f} tok/char")
|
| 76 |
+
if problems:
|
| 77 |
+
print(f" ⚠️ Issues: {', '.join(problems)}")
|
| 78 |
+
if model_results[text_name]["problematic_tokens"]:
|
| 79 |
+
prob_tokens = model_results[text_name]["problematic_tokens"][:3]
|
| 80 |
+
print(f" 🔍 Examples: {prob_tokens}")
|
| 81 |
+
|
| 82 |
+
results[model_name] = model_results
|
| 83 |
+
print(f" ⏱️ Load time: {load_time:.2f}s")
|
| 84 |
+
print()
|
| 85 |
+
|
| 86 |
+
except Exception as e:
|
| 87 |
+
print(f" ❌ Failed: {e}")
|
| 88 |
+
print()
|
| 89 |
+
|
| 90 |
+
# Summary comparison
|
| 91 |
+
print("📊 EFFICIENCY SUMMARY (Swiss German 1)")
|
| 92 |
+
print("=" * 50)
|
| 93 |
+
|
| 94 |
+
swiss_results = []
|
| 95 |
+
for model_name, model_data in results.items():
|
| 96 |
+
if "Swiss German 1" in model_data:
|
| 97 |
+
data = model_data["Swiss German 1"]
|
| 98 |
+
swiss_results.append({
|
| 99 |
+
"model": model_name,
|
| 100 |
+
"tokens": data["token_count"],
|
| 101 |
+
"efficiency": data["efficiency"],
|
| 102 |
+
"problems": len(data["problematic_tokens"])
|
| 103 |
+
})
|
| 104 |
+
|
| 105 |
+
# Sort by efficiency (lower = better)
|
| 106 |
+
swiss_results.sort(key=lambda x: x["efficiency"])
|
| 107 |
+
|
| 108 |
+
print("Ranking (lower tokens/char = better):")
|
| 109 |
+
for i, result in enumerate(swiss_results):
|
| 110 |
+
rank_emoji = ["🥇", "🥈", "🥉"][i] if i < 3 else f"{i+1}."
|
| 111 |
+
print(f"{rank_emoji} {result['model']:20s}: {result['efficiency']:.3f} tok/char, "
|
| 112 |
+
f"{result['tokens']} tokens, {result['problems']} problems")
|
| 113 |
+
|
| 114 |
+
# Show detailed tokenization for best and worst
|
| 115 |
+
if len(swiss_results) >= 2:
|
| 116 |
+
best = swiss_results[0]
|
| 117 |
+
worst = swiss_results[-1]
|
| 118 |
+
|
| 119 |
+
print(f"\n🔍 DETAILED COMPARISON")
|
| 120 |
+
print("=" * 50)
|
| 121 |
+
|
| 122 |
+
text = texts["Swiss German 1"]
|
| 123 |
+
print(f"Text: {text}")
|
| 124 |
+
print()
|
| 125 |
+
|
| 126 |
+
for model_type, model_name in [(best, "BEST"), (worst, "WORST")]:
|
| 127 |
+
print(f"{model_name}: {model_type['model']}")
|
| 128 |
+
tokens = results[model_type['model']]["Swiss German 1"]["tokens"]
|
| 129 |
+
print("Tokens:")
|
| 130 |
+
for i, token in enumerate(tokens):
|
| 131 |
+
marker = " ⚠️" if ("Ã" in token or (len(token) == 1 and token.isalpha())) else ""
|
| 132 |
+
print(f" {i+1:2d}: |{token}|{marker}")
|
| 133 |
+
print()
|
| 134 |
+
|
| 135 |
+
if __name__ == "__main__":
|
| 136 |
+
compare_tokenizers()
|
requirements.txt
CHANGED
|
@@ -5,4 +5,10 @@ gradio==5.44.0
|
|
| 5 |
plotly
|
| 6 |
numpy<2.0.0
|
| 7 |
pandas
|
| 8 |
-
scipy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
plotly
|
| 6 |
numpy<2.0.0
|
| 7 |
pandas
|
| 8 |
+
scipy
|
| 9 |
+
pytorch_optimizer
|
| 10 |
+
matplotlib
|
| 11 |
+
seaborn
|
| 12 |
+
protobuf
|
| 13 |
+
sentencepiece
|
| 14 |
+
psutil
|
test_apertus_only.py
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
🇨🇭 Apertus Swiss German Test
|
| 4 |
+
Fokussiert nur auf Apertus Model Testing
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import torch
|
| 8 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 9 |
+
import time
|
| 10 |
+
|
| 11 |
+
def test_apertus_swiss_german():
|
| 12 |
+
print("🇨🇭 APERTUS SWISS GERMAN TEST")
|
| 13 |
+
print("=" * 40)
|
| 14 |
+
|
| 15 |
+
model_id = "swiss-ai/Apertus-8B-Instruct-2509"
|
| 16 |
+
|
| 17 |
+
# Check GPU
|
| 18 |
+
if not torch.cuda.is_available():
|
| 19 |
+
print("❌ CUDA not available - Apertus needs GPU")
|
| 20 |
+
return
|
| 21 |
+
|
| 22 |
+
gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
|
| 23 |
+
print(f"🎮 GPU: {torch.cuda.get_device_name()}")
|
| 24 |
+
print(f"💾 Memory: {gpu_memory:.1f} GB")
|
| 25 |
+
|
| 26 |
+
if gpu_memory < 20:
|
| 27 |
+
print("⚠️ Warning: Low GPU memory for Apertus-8B")
|
| 28 |
+
|
| 29 |
+
# Swiss German test questions
|
| 30 |
+
questions = [
|
| 31 |
+
"Grüezi! Chönd Sie mer bitte erchläre was KI isch?",
|
| 32 |
+
"Wie funktioniert Künstlichi Intelligänz?",
|
| 33 |
+
"Was sind d Vorteile und Nochteile vo KI?",
|
| 34 |
+
"Chönd Sie mer es Bispiil vo KI im Alldag gäh?"
|
| 35 |
+
]
|
| 36 |
+
|
| 37 |
+
try:
|
| 38 |
+
print("\n📥 Loading Apertus tokenizer...")
|
| 39 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 40 |
+
if tokenizer.pad_token is None:
|
| 41 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 42 |
+
|
| 43 |
+
print("🚀 Loading Apertus model...")
|
| 44 |
+
# Use bfloat16 to match the model's internal expectations
|
| 45 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 46 |
+
model_id,
|
| 47 |
+
torch_dtype=torch.bfloat16, # Changed from float16
|
| 48 |
+
device_map="auto",
|
| 49 |
+
low_cpu_mem_usage=True
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
print(f"✅ Model loaded on: {next(model.parameters()).device}")
|
| 53 |
+
|
| 54 |
+
for i, question in enumerate(questions, 1):
|
| 55 |
+
print(f"\n{'='*60}")
|
| 56 |
+
print(f"📝 Question {i}: {question}")
|
| 57 |
+
print('='*60)
|
| 58 |
+
|
| 59 |
+
# Format with Swiss German system prompt
|
| 60 |
+
prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
| 61 |
+
|
| 62 |
+
### System:
|
| 63 |
+
Du bisch en hilfreiche Schwyzer KI-Assistent. Du verstahsch und redsch flüssig Schweizerdütsch. Bitte antworte uf Schweizerdütsch wänn du drüm bete wirst.
|
| 64 |
+
|
| 65 |
+
### Instruction:
|
| 66 |
+
{question}
|
| 67 |
+
|
| 68 |
+
### Response:
|
| 69 |
+
"""
|
| 70 |
+
|
| 71 |
+
print(f"🔢 Tokenizing...")
|
| 72 |
+
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
|
| 73 |
+
device = next(model.parameters()).device
|
| 74 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 75 |
+
|
| 76 |
+
print(f"⚡ Generating... (Input: {inputs['input_ids'].shape[1]} tokens)")
|
| 77 |
+
|
| 78 |
+
start_time = time.time()
|
| 79 |
+
with torch.no_grad():
|
| 80 |
+
outputs = model.generate(
|
| 81 |
+
input_ids=inputs["input_ids"],
|
| 82 |
+
attention_mask=inputs.get("attention_mask"),
|
| 83 |
+
max_new_tokens=150,
|
| 84 |
+
temperature=0.7,
|
| 85 |
+
do_sample=True,
|
| 86 |
+
top_p=0.9,
|
| 87 |
+
pad_token_id=tokenizer.pad_token_id,
|
| 88 |
+
repetition_penalty=1.1
|
| 89 |
+
# Removed early_stopping - not supported by this model
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
generation_time = time.time() - start_time
|
| 93 |
+
|
| 94 |
+
# Decode response
|
| 95 |
+
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 96 |
+
response = full_response[len(prompt):].strip()
|
| 97 |
+
|
| 98 |
+
print(f"✅ Generated in {generation_time:.2f}s")
|
| 99 |
+
print(f"📖 ANTWORT:")
|
| 100 |
+
print("-" * 40)
|
| 101 |
+
print(response)
|
| 102 |
+
print("-" * 40)
|
| 103 |
+
|
| 104 |
+
# Analyze response quality
|
| 105 |
+
swiss_indicators = sum(1 for word in ['isch', 'mer', 'chönd', 'gäh', 'wänd', 'hend', 'sind', 'bin']
|
| 106 |
+
if word in response.lower())
|
| 107 |
+
german_words = sum(1 for word in ['ist', 'mir', 'können', 'geben', 'wollen', 'haben', 'sind', 'bin']
|
| 108 |
+
if word in response.lower())
|
| 109 |
+
|
| 110 |
+
print(f"🔍 Analysis:")
|
| 111 |
+
print(f" Swiss German indicators: {swiss_indicators}")
|
| 112 |
+
print(f" Standard German words: {german_words}")
|
| 113 |
+
print(f" Response length: {len(response)} chars, {len(response.split())} words")
|
| 114 |
+
|
| 115 |
+
if swiss_indicators > german_words:
|
| 116 |
+
print(f" ✅ Appears to be Swiss German!")
|
| 117 |
+
elif german_words > swiss_indicators:
|
| 118 |
+
print(f" ⚠️ Appears to be Standard German")
|
| 119 |
+
else:
|
| 120 |
+
print(f" 🤔 Mixed or unclear")
|
| 121 |
+
|
| 122 |
+
except Exception as e:
|
| 123 |
+
print(f"❌ Error: {e}")
|
| 124 |
+
import traceback
|
| 125 |
+
traceback.print_exc()
|
| 126 |
+
|
| 127 |
+
if __name__ == "__main__":
|
| 128 |
+
test_apertus_swiss_german()
|
test_big_models_comparison.py
ADDED
|
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
🏆 Big Models Swiss German Comparison
|
| 4 |
+
Vergleicht die großen Open Source Modelle mit Apertus
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import torch
|
| 8 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 9 |
+
import time
|
| 10 |
+
|
| 11 |
+
def test_swiss_german_comparison():
|
| 12 |
+
print("🏆 BIG MODELS SWISS GERMAN COMPARISON")
|
| 13 |
+
print("=" * 50)
|
| 14 |
+
|
| 15 |
+
# Check setup
|
| 16 |
+
if not torch.cuda.is_available():
|
| 17 |
+
print("❌ CUDA required for big models")
|
| 18 |
+
return
|
| 19 |
+
|
| 20 |
+
gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
|
| 21 |
+
print(f"🎮 GPU: {torch.cuda.get_device_name()}")
|
| 22 |
+
print(f"💾 Memory: {gpu_memory:.1f} GB")
|
| 23 |
+
|
| 24 |
+
if gpu_memory < 35:
|
| 25 |
+
print("⚠️ Warning: Need 35GB+ for all models")
|
| 26 |
+
|
| 27 |
+
# Big models to compare - using public versions
|
| 28 |
+
models = [
|
| 29 |
+
("🇨🇭 Apertus-8B", "swiss-ai/Apertus-8B-Instruct-2509"),
|
| 30 |
+
("🦙 Llama-3-8B", "meta-llama/Meta-Llama-3-8B-Instruct"), # Access granted
|
| 31 |
+
("🌸 Mistral-7B", "mistralai/Mistral-7B-Instruct-v0.1"), # Public version
|
| 32 |
+
("🌺 BLOOM-7B", "bigscience/bloom-7b1"),
|
| 33 |
+
]
|
| 34 |
+
|
| 35 |
+
# Test question in Swiss German
|
| 36 |
+
question = "Grüezi! Chönd Sie mer bitte erchläre was KI isch?"
|
| 37 |
+
|
| 38 |
+
print(f"\n🎯 Question: {question}")
|
| 39 |
+
print("=" * 50)
|
| 40 |
+
|
| 41 |
+
results = []
|
| 42 |
+
|
| 43 |
+
for model_name, model_id in models:
|
| 44 |
+
print(f"\n{'='*60}")
|
| 45 |
+
print(f"🧠 Testing: {model_name}")
|
| 46 |
+
print(f"📦 Model: {model_id}")
|
| 47 |
+
print('='*60)
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
# Format prompt for each model
|
| 51 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 52 |
+
if tokenizer.pad_token is None:
|
| 53 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 54 |
+
|
| 55 |
+
# Model-specific prompting
|
| 56 |
+
if "Apertus" in model_id:
|
| 57 |
+
prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
| 58 |
+
|
| 59 |
+
### System:
|
| 60 |
+
Du bisch en hilfreiche Schwyzer KI-Assistent. Du verstahsch und redsch flüssig Schweizerdütsch.
|
| 61 |
+
|
| 62 |
+
### Instruction:
|
| 63 |
+
{question}
|
| 64 |
+
|
| 65 |
+
### Response:
|
| 66 |
+
"""
|
| 67 |
+
elif "Llama" in model_id:
|
| 68 |
+
# Llama-3 format (access granted)
|
| 69 |
+
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
| 70 |
+
|
| 71 |
+
You are a helpful AI assistant fluent in Swiss German. Please respond in authentic Schweizerdeutsch.
|
| 72 |
+
|
| 73 |
+
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
| 74 |
+
|
| 75 |
+
{question}
|
| 76 |
+
|
| 77 |
+
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
| 78 |
+
|
| 79 |
+
"""
|
| 80 |
+
elif "Mistral" in model_id:
|
| 81 |
+
prompt = f"[INST] Du bisch en hilfreiche Assistent wo Schweizerdütsch redt. Bitte antworte uf Schweizerdütsch:\n\n{question} [/INST]"
|
| 82 |
+
else: # BLOOM
|
| 83 |
+
prompt = f"Human: Please respond in Swiss German:\n\n{question}\n\nAssistant:"
|
| 84 |
+
|
| 85 |
+
print(f"📝 Prompt format: {prompt[:60]}...")
|
| 86 |
+
|
| 87 |
+
# Tokenize
|
| 88 |
+
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
|
| 89 |
+
print(f"🔢 Input tokens: {inputs['input_ids'].shape[1]}")
|
| 90 |
+
|
| 91 |
+
# Load model
|
| 92 |
+
print("🚀 Loading model...")
|
| 93 |
+
start_load = time.time()
|
| 94 |
+
|
| 95 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 96 |
+
model_id,
|
| 97 |
+
torch_dtype=torch.bfloat16,
|
| 98 |
+
device_map="auto",
|
| 99 |
+
low_cpu_mem_usage=True
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
load_time = time.time() - start_load
|
| 103 |
+
print(f"✅ Loaded in {load_time:.1f}s")
|
| 104 |
+
|
| 105 |
+
# Move inputs to model device
|
| 106 |
+
device = next(model.parameters()).device
|
| 107 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 108 |
+
print(f"🎯 Model device: {device}")
|
| 109 |
+
|
| 110 |
+
# Generate
|
| 111 |
+
print("⚡ Generating...")
|
| 112 |
+
start_gen = time.time()
|
| 113 |
+
|
| 114 |
+
with torch.no_grad():
|
| 115 |
+
outputs = model.generate(
|
| 116 |
+
input_ids=inputs["input_ids"],
|
| 117 |
+
attention_mask=inputs.get("attention_mask"),
|
| 118 |
+
max_new_tokens=120,
|
| 119 |
+
temperature=0.7,
|
| 120 |
+
do_sample=True,
|
| 121 |
+
top_p=0.9,
|
| 122 |
+
pad_token_id=tokenizer.pad_token_id,
|
| 123 |
+
repetition_penalty=1.1
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
gen_time = time.time() - start_gen
|
| 127 |
+
|
| 128 |
+
# Decode
|
| 129 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 130 |
+
answer = response[len(prompt):].strip()
|
| 131 |
+
|
| 132 |
+
# Analyze Swiss German quality
|
| 133 |
+
swiss_indicators = ['isch', 'cha', 'mer', 'chönd', 'gäh', 'hend', 'sind', 'vo', 'uf', 'mit']
|
| 134 |
+
swiss_count = sum(1 for word in swiss_indicators if word in answer.lower())
|
| 135 |
+
|
| 136 |
+
german_words = ['ist', 'kann', 'mir', 'können', 'geben', 'haben', 'sind', 'von', 'auf', 'mit']
|
| 137 |
+
german_count = sum(1 for word in german_words if word in answer.lower())
|
| 138 |
+
|
| 139 |
+
results.append({
|
| 140 |
+
'model': model_name,
|
| 141 |
+
'response': answer,
|
| 142 |
+
'swiss_score': swiss_count,
|
| 143 |
+
'german_score': german_count,
|
| 144 |
+
'load_time': load_time,
|
| 145 |
+
'gen_time': gen_time,
|
| 146 |
+
'length': len(answer)
|
| 147 |
+
})
|
| 148 |
+
|
| 149 |
+
print(f"✅ Generated in {gen_time:.2f}s")
|
| 150 |
+
print(f"📊 Swiss indicators: {swiss_count}, German words: {german_count}")
|
| 151 |
+
print(f"📖 RESPONSE ({len(answer)} chars):")
|
| 152 |
+
print("-" * 50)
|
| 153 |
+
print(answer)
|
| 154 |
+
print("-" * 50)
|
| 155 |
+
|
| 156 |
+
# Clear memory
|
| 157 |
+
del model
|
| 158 |
+
torch.cuda.empty_cache()
|
| 159 |
+
|
| 160 |
+
except Exception as e:
|
| 161 |
+
print(f"❌ Failed: {e}")
|
| 162 |
+
results.append({
|
| 163 |
+
'model': model_name,
|
| 164 |
+
'response': f"ERROR: {e}",
|
| 165 |
+
'swiss_score': 0,
|
| 166 |
+
'german_score': 0,
|
| 167 |
+
'load_time': 0,
|
| 168 |
+
'gen_time': 0,
|
| 169 |
+
'length': 0
|
| 170 |
+
})
|
| 171 |
+
|
| 172 |
+
# Final comparison
|
| 173 |
+
print(f"\n🏆 FINAL COMPARISON")
|
| 174 |
+
print("=" * 60)
|
| 175 |
+
|
| 176 |
+
# Sort by Swiss German authenticity
|
| 177 |
+
successful = [r for r in results if not r['response'].startswith('ERROR')]
|
| 178 |
+
if successful:
|
| 179 |
+
ranked = sorted(successful, key=lambda x: x['swiss_score'], reverse=True)
|
| 180 |
+
|
| 181 |
+
print("🥇 RANKING (by Swiss German authenticity):")
|
| 182 |
+
for i, result in enumerate(ranked):
|
| 183 |
+
rank_emoji = ["🥇", "🥈", "🥉"][i] if i < 3 else f"{i+1}."
|
| 184 |
+
authenticity = "🇨🇭 Authentic" if result['swiss_score'] > result['german_score'] else "🇩🇪 Standard German" if result['german_score'] > result['swiss_score'] else "🤔 Mixed"
|
| 185 |
+
|
| 186 |
+
print(f"{rank_emoji} {result['model']}: {result['swiss_score']} Swiss indicators, {authenticity}")
|
| 187 |
+
print(f" Response: {result['response'][:100]}...")
|
| 188 |
+
print()
|
| 189 |
+
|
| 190 |
+
print("🏁 Comparison complete!")
|
| 191 |
+
|
| 192 |
+
if __name__ == "__main__":
|
| 193 |
+
test_swiss_german_comparison()
|
test_swiss_german_generation.py
ADDED
|
@@ -0,0 +1,350 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
#!/usr/bin/env python3
|
| 3 |
+
"""
|
| 4 |
+
🇨🇭 Swiss German AI Model Comparison Script
|
| 5 |
+
Test verschiedene Modelle auf ihre Fähigkeit, KI auf Schweizerdeutsch zu erklären
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import torch
|
| 9 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
| 10 |
+
import time
|
| 11 |
+
import json
|
| 12 |
+
from datetime import datetime
|
| 13 |
+
|
| 14 |
+
def test_model_generation(model_name, model_id, prompt, max_new_tokens=150):
|
| 15 |
+
"""Test text generation for a specific model"""
|
| 16 |
+
print(f"\n{'='*60}")
|
| 17 |
+
print(f"🧠 Testing: {model_name}")
|
| 18 |
+
print(f"📦 Model ID: {model_id}")
|
| 19 |
+
print(f"❓ Prompt: {prompt}")
|
| 20 |
+
print('='*60)
|
| 21 |
+
|
| 22 |
+
result = {
|
| 23 |
+
"model_name": model_name,
|
| 24 |
+
"model_id": model_id,
|
| 25 |
+
"prompt": prompt,
|
| 26 |
+
"timestamp": datetime.now().isoformat(),
|
| 27 |
+
"success": False,
|
| 28 |
+
"error": None,
|
| 29 |
+
"response": None,
|
| 30 |
+
"token_count": None,
|
| 31 |
+
"generation_time": None
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
try:
|
| 35 |
+
start_time = time.time()
|
| 36 |
+
|
| 37 |
+
# Load tokenizer
|
| 38 |
+
print("📥 Loading tokenizer...")
|
| 39 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 40 |
+
if tokenizer.pad_token is None:
|
| 41 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 42 |
+
|
| 43 |
+
# Format prompt based on model type
|
| 44 |
+
if "Apertus" in model_id:
|
| 45 |
+
formatted_prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
| 46 |
+
|
| 47 |
+
### System:
|
| 48 |
+
You are a helpful Swiss AI assistant. You understand and speak Swiss German (Schweizerdeutsch) fluently. Please respond in authentic Swiss German when asked.
|
| 49 |
+
|
| 50 |
+
### Instruction:
|
| 51 |
+
{prompt}
|
| 52 |
+
|
| 53 |
+
### Response:
|
| 54 |
+
"""
|
| 55 |
+
elif "Llama" in model_id:
|
| 56 |
+
# Llama-3 format (access granted)
|
| 57 |
+
formatted_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
| 58 |
+
|
| 59 |
+
You are a helpful AI assistant who can speak Swiss German fluently. When asked to explain something in Swiss German (Schweizerdeutsch), please respond authentically in that dialect.
|
| 60 |
+
|
| 61 |
+
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
| 62 |
+
|
| 63 |
+
{prompt}
|
| 64 |
+
|
| 65 |
+
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
| 66 |
+
|
| 67 |
+
"""
|
| 68 |
+
elif "Mistral" in model_id:
|
| 69 |
+
# Mistral format
|
| 70 |
+
formatted_prompt = f"""[INST] You are a helpful assistant who speaks Swiss German. Please respond to the following request in authentic Swiss German (Schweizerdeutsch):
|
| 71 |
+
|
| 72 |
+
{prompt} [/INST]"""
|
| 73 |
+
elif "bloom" in model_id.lower():
|
| 74 |
+
# BLOOM - simple format with context
|
| 75 |
+
formatted_prompt = f"""Human: Please respond in Swiss German (Schweizerdeutsch):
|
| 76 |
+
|
| 77 |
+
{prompt}
|
| 78 |
+
|
| 79 |
+
AI:"""
|
| 80 |
+
elif "german" in model_id.lower():
|
| 81 |
+
# Better prompting for German models
|
| 82 |
+
formatted_prompt = f"""Als hilfreicher Assistent beantworte bitte die folgende Frage ausführlich:
|
| 83 |
+
|
| 84 |
+
Frage: {prompt}
|
| 85 |
+
|
| 86 |
+
Antwort:"""
|
| 87 |
+
else:
|
| 88 |
+
# For English models, clarify the task
|
| 89 |
+
if any(swiss_word in prompt.lower() for swiss_word in ['schweiz', 'chönd', 'isch', 'mer']):
|
| 90 |
+
formatted_prompt = f"""Please respond to this Swiss German question by explaining the topic in Swiss German language:
|
| 91 |
+
|
| 92 |
+
Question: {prompt}
|
| 93 |
+
|
| 94 |
+
Answer:"""
|
| 95 |
+
else:
|
| 96 |
+
formatted_prompt = prompt
|
| 97 |
+
|
| 98 |
+
print(f"📝 Formatted prompt: {formatted_prompt[:100]}...")
|
| 99 |
+
|
| 100 |
+
# Tokenize
|
| 101 |
+
inputs = tokenizer(
|
| 102 |
+
formatted_prompt,
|
| 103 |
+
return_tensors="pt",
|
| 104 |
+
max_length=512,
|
| 105 |
+
truncation=True,
|
| 106 |
+
padding=True # Add padding
|
| 107 |
+
)
|
| 108 |
+
input_length = inputs["input_ids"].shape[1]
|
| 109 |
+
result["input_tokens"] = input_length
|
| 110 |
+
|
| 111 |
+
print(f"🔢 Input tokens: {input_length}")
|
| 112 |
+
|
| 113 |
+
# Load model
|
| 114 |
+
print("🚀 Loading model...")
|
| 115 |
+
|
| 116 |
+
# Try different loading strategies based on available hardware
|
| 117 |
+
if torch.cuda.is_available():
|
| 118 |
+
print("🎮 Using CUDA")
|
| 119 |
+
# Use appropriate dtype for each model
|
| 120 |
+
if "Apertus" in model_id:
|
| 121 |
+
torch_dtype = torch.bfloat16
|
| 122 |
+
print("🔧 Using bfloat16 for Apertus compatibility")
|
| 123 |
+
elif any(large_model in model_id for large_model in ["Llama", "Mistral", "bloom"]):
|
| 124 |
+
torch_dtype = torch.bfloat16 # Large modern models prefer bfloat16
|
| 125 |
+
print("🔧 Using bfloat16 for large model compatibility")
|
| 126 |
+
else:
|
| 127 |
+
torch_dtype = torch.float16
|
| 128 |
+
print("🔧 Using float16 for smaller models")
|
| 129 |
+
|
| 130 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 131 |
+
model_id,
|
| 132 |
+
torch_dtype=torch_dtype,
|
| 133 |
+
device_map="auto",
|
| 134 |
+
low_cpu_mem_usage=True
|
| 135 |
+
)
|
| 136 |
+
# Move inputs to same device as model
|
| 137 |
+
device = next(model.parameters()).device
|
| 138 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 139 |
+
else:
|
| 140 |
+
print("💻 Using CPU")
|
| 141 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 142 |
+
model_id,
|
| 143 |
+
torch_dtype=torch.float32,
|
| 144 |
+
device_map="cpu",
|
| 145 |
+
low_cpu_mem_usage=True
|
| 146 |
+
)
|
| 147 |
+
|
| 148 |
+
# Generate response
|
| 149 |
+
print("⚡ Generating response...")
|
| 150 |
+
generation_start = time.time()
|
| 151 |
+
|
| 152 |
+
with torch.no_grad():
|
| 153 |
+
outputs = model.generate(
|
| 154 |
+
input_ids=inputs["input_ids"],
|
| 155 |
+
attention_mask=inputs.get("attention_mask", None),
|
| 156 |
+
max_length=input_length + max_new_tokens,
|
| 157 |
+
temperature=0.8, # Bit more creative
|
| 158 |
+
do_sample=True,
|
| 159 |
+
top_p=0.9, # Nucleus sampling
|
| 160 |
+
top_k=50, # Limit choices
|
| 161 |
+
pad_token_id=tokenizer.pad_token_id,
|
| 162 |
+
repetition_penalty=1.15, # Stronger repetition penalty
|
| 163 |
+
no_repeat_ngram_size=4 # Longer n-gram blocking
|
| 164 |
+
# Removed early_stopping - not supported by Apertus
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
generation_time = time.time() - generation_start
|
| 168 |
+
|
| 169 |
+
# Decode response
|
| 170 |
+
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 171 |
+
response_only = full_response[len(formatted_prompt):].strip()
|
| 172 |
+
|
| 173 |
+
result["success"] = True
|
| 174 |
+
result["response"] = response_only
|
| 175 |
+
result["generation_time"] = generation_time
|
| 176 |
+
result["total_tokens"] = outputs[0].shape[0]
|
| 177 |
+
|
| 178 |
+
print(f"✅ Generation successful!")
|
| 179 |
+
print(f"⏱️ Time: {generation_time:.2f}s")
|
| 180 |
+
print(f"🔤 Generated tokens: {outputs[0].shape[0] - input_length}")
|
| 181 |
+
print(f"\n📖 ANTWORT:")
|
| 182 |
+
print("-" * 40)
|
| 183 |
+
print(response_only)
|
| 184 |
+
print("-" * 40)
|
| 185 |
+
|
| 186 |
+
except Exception as e:
|
| 187 |
+
result["error"] = str(e)
|
| 188 |
+
print(f"❌ Error: {e}")
|
| 189 |
+
|
| 190 |
+
return result
|
| 191 |
+
|
| 192 |
+
def test_tokenization_only(model_name, model_id, prompt):
|
| 193 |
+
"""Test only tokenization for large models"""
|
| 194 |
+
print(f"\n{'='*60}")
|
| 195 |
+
print(f"🔍 Tokenization Test: {model_name}")
|
| 196 |
+
print('='*60)
|
| 197 |
+
|
| 198 |
+
try:
|
| 199 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 200 |
+
|
| 201 |
+
# Show different prompt formats
|
| 202 |
+
if "Apertus" in model_id:
|
| 203 |
+
formatted_prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
| 204 |
+
|
| 205 |
+
### System:
|
| 206 |
+
You are a helpful AI assistant that can speak Swiss German.
|
| 207 |
+
|
| 208 |
+
### Instruction:
|
| 209 |
+
{prompt}
|
| 210 |
+
|
| 211 |
+
### Response:
|
| 212 |
+
"""
|
| 213 |
+
else:
|
| 214 |
+
formatted_prompt = prompt
|
| 215 |
+
|
| 216 |
+
# Tokenize
|
| 217 |
+
tokens = tokenizer.tokenize(formatted_prompt)
|
| 218 |
+
token_ids = tokenizer.convert_tokens_to_ids(tokens)
|
| 219 |
+
|
| 220 |
+
print(f"📝 Formatted prompt: {formatted_prompt}")
|
| 221 |
+
print(f"🔢 Token count: {len(tokens)}")
|
| 222 |
+
print(f"🎯 Tokens per character: {len(tokens)/len(formatted_prompt):.3f}")
|
| 223 |
+
print(f"🏷️ First 10 tokens: {tokens[:10]}")
|
| 224 |
+
print(f"🔑 First 10 token IDs: {token_ids[:10]}")
|
| 225 |
+
|
| 226 |
+
# Check for problematic tokens
|
| 227 |
+
problematic = [t for t in tokens if "Ã" in t or (len(t) == 1 and t.isalpha())]
|
| 228 |
+
if problematic:
|
| 229 |
+
print(f"⚠️ Problematic tokens: {problematic[:5]}")
|
| 230 |
+
else:
|
| 231 |
+
print("✅ No obvious tokenization problems")
|
| 232 |
+
|
| 233 |
+
return True
|
| 234 |
+
|
| 235 |
+
except Exception as e:
|
| 236 |
+
print(f"❌ Tokenization failed: {e}")
|
| 237 |
+
return False
|
| 238 |
+
|
| 239 |
+
def main():
|
| 240 |
+
print("🇨🇭 SWISS GERMAN AI MODEL COMPARISON")
|
| 241 |
+
print("=" * 50)
|
| 242 |
+
print(f"🕐 Started at: {datetime.now()}")
|
| 243 |
+
print(f"🔧 PyTorch version: {torch.__version__}")
|
| 244 |
+
print(f"🎮 CUDA available: {torch.cuda.is_available()}")
|
| 245 |
+
if torch.cuda.is_available():
|
| 246 |
+
print(f"🎯 GPU: {torch.cuda.get_device_name()}")
|
| 247 |
+
print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
|
| 248 |
+
|
| 249 |
+
# Check HuggingFace login for gated models
|
| 250 |
+
print("\n🔐 Checking HuggingFace Authentication...")
|
| 251 |
+
try:
|
| 252 |
+
from huggingface_hub import whoami
|
| 253 |
+
user_info = whoami()
|
| 254 |
+
print(f"✅ Logged in as: {user_info['name']}")
|
| 255 |
+
except Exception as e:
|
| 256 |
+
print("⚠️ Not logged in to HuggingFace")
|
| 257 |
+
print(" Gated models (like Apertus) will be skipped")
|
| 258 |
+
print(" Run: huggingface-cli login")
|
| 259 |
+
|
| 260 |
+
# Test prompts
|
| 261 |
+
prompts = [
|
| 262 |
+
"Bitte erkläre mir KI auf Schweizerdeutsch",
|
| 263 |
+
"Chönd Sie mer d Künstlichi Intelligänz erchläre?",
|
| 264 |
+
"Was isch KI und wie funktioniert das?"
|
| 265 |
+
]
|
| 266 |
+
|
| 267 |
+
# Models to test (ordered by size - smallest first)
|
| 268 |
+
models = [
|
| 269 |
+
("🇩🇪 German GPT-2", "dbmdz/german-gpt2"),
|
| 270 |
+
("🤖 DistilGPT-2 English", "distilgpt2"),
|
| 271 |
+
("🇩🇪 German BERT (encoder only)", "bert-base-german-cased"),
|
| 272 |
+
("🦙 Llama-3-8B-Instruct", "meta-llama/Meta-Llama-3-8B-Instruct"), # Access granted
|
| 273 |
+
("🌸 Mistral-7B-Instruct", "mistralai/Mistral-7B-Instruct-v0.1"), # Earlier public version
|
| 274 |
+
("🌺 BLOOM-7B1", "bigscience/bloom-7b1"),
|
| 275 |
+
("🤖 DialoGPT-Large", "microsoft/DialoGPT-large"),
|
| 276 |
+
("🇨🇭 Apertus 8B", "swiss-ai/Apertus-8B-Instruct-2509"),
|
| 277 |
+
]
|
| 278 |
+
|
| 279 |
+
all_results = []
|
| 280 |
+
|
| 281 |
+
# Test each prompt with each model
|
| 282 |
+
for prompt in prompts:
|
| 283 |
+
print(f"\n🎯 TESTING PROMPT: '{prompt}'")
|
| 284 |
+
print("=" * 80)
|
| 285 |
+
|
| 286 |
+
for model_name, model_id in models:
|
| 287 |
+
try:
|
| 288 |
+
if "bert" in model_id.lower():
|
| 289 |
+
print(f"\n⚠️ Skipping {model_name} (encoder-only model)")
|
| 290 |
+
continue
|
| 291 |
+
|
| 292 |
+
# Check if model needs special handling for size
|
| 293 |
+
large_models = ["Apertus", "Llama", "Mistral", "bloom", "DialoGPT-large"]
|
| 294 |
+
is_large_model = any(large_model in model_id for large_model in large_models)
|
| 295 |
+
|
| 296 |
+
if is_large_model:
|
| 297 |
+
# Check GPU memory for large models
|
| 298 |
+
gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9 if torch.cuda.is_available() else 0
|
| 299 |
+
if gpu_memory > 35: # 35GB+ should handle 7B-8B models
|
| 300 |
+
print(f"\n🚀 GPU has {gpu_memory:.1f}GB - attempting {model_name} generation!")
|
| 301 |
+
# Reduce tokens for large models to prevent OOM
|
| 302 |
+
max_tokens = 80 if "Apertus" in model_id else 100
|
| 303 |
+
result = test_model_generation(model_name, model_id, prompt, max_new_tokens=max_tokens)
|
| 304 |
+
all_results.append(result)
|
| 305 |
+
else:
|
| 306 |
+
print(f"\n📏 Large model detected: {model_name}")
|
| 307 |
+
print(f"🔍 GPU only has {gpu_memory:.1f}GB - tokenization only")
|
| 308 |
+
test_tokenization_only(model_name, model_id, prompt)
|
| 309 |
+
else:
|
| 310 |
+
# Try full generation for smaller models
|
| 311 |
+
result = test_model_generation(model_name, model_id, prompt)
|
| 312 |
+
all_results.append(result)
|
| 313 |
+
|
| 314 |
+
except KeyboardInterrupt:
|
| 315 |
+
print("\n⏹️ Interrupted by user")
|
| 316 |
+
break
|
| 317 |
+
except Exception as e:
|
| 318 |
+
print(f"\n❌ Unexpected error with {model_name}: {e}")
|
| 319 |
+
continue
|
| 320 |
+
|
| 321 |
+
# Save results
|
| 322 |
+
if all_results:
|
| 323 |
+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 324 |
+
filename = f"swiss_german_test_results_{timestamp}.json"
|
| 325 |
+
|
| 326 |
+
with open(filename, 'w', encoding='utf-8') as f:
|
| 327 |
+
json.dump(all_results, f, indent=2, ensure_ascii=False)
|
| 328 |
+
|
| 329 |
+
print(f"\n💾 Results saved to: {filename}")
|
| 330 |
+
|
| 331 |
+
# Summary
|
| 332 |
+
print(f"\n📊 SUMMARY")
|
| 333 |
+
print("=" * 50)
|
| 334 |
+
successful = [r for r in all_results if r["success"]]
|
| 335 |
+
failed = [r for r in all_results if not r["success"]]
|
| 336 |
+
|
| 337 |
+
print(f"✅ Successful generations: {len(successful)}")
|
| 338 |
+
print(f"❌ Failed generations: {len(failed)}")
|
| 339 |
+
|
| 340 |
+
if successful:
|
| 341 |
+
print(f"\n🏆 BEST RESPONSES:")
|
| 342 |
+
for result in successful:
|
| 343 |
+
print(f"\n🤖 {result['model_name']}:")
|
| 344 |
+
response = result['response'][:200] + "..." if len(result['response']) > 200 else result['response']
|
| 345 |
+
print(f" '{response}'")
|
| 346 |
+
|
| 347 |
+
print(f"\n🏁 Test completed at: {datetime.now()}")
|
| 348 |
+
|
| 349 |
+
if __name__ == "__main__":
|
| 350 |
+
main()
|