Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

App Files Files Community

open-finance-llm-8b / scripts /GGUF_CONVERSION_SUMMARY.md

jeanbaptdzd

Add GGUF conversion script for DragonLLM 32B models

e3724fa 3 days ago

preview code

raw

history blame contribute delete

2.9 kB

GGUF Conversion Setup Complete ✅

What Was Created

scripts/convert_to_gguf.py - Main conversion script
scripts/README_GGUF.md - Detailed usage instructions
Dependencies installed - transformers, torch, sentencepiece, etc.

Quick Start

cd /Users/jeanbapt/simple-llm-pro-finance
source venv/bin/activate

# Convert default model (Qwen-Pro-Finance-R-32B)
python3 scripts/convert_to_gguf.py

# Or specify a different 32B model
python3 scripts/convert_to_gguf.py 2  # qwen3-32b-fin-v1.0

Available 32B Models

The script found these 32B models in DragonLLM:

DragonLLM/Qwen-Pro-Finance-R-32B ⭐ (Recommended - latest)
DragonLLM/qwen3-32b-fin-v1.0
DragonLLM/qwen3-32b-fin-v0.3
DragonLLM/qwen3-32b-fin-v1.0-fp8 (Pre-quantized)
DragonLLM/Qwen-Pro-Finance-R-32B-FP8 (Pre-quantized)

What the Script Does

✅ Checks for llama.cpp (clones if needed)
✅ Installs required Python dependencies
✅ Converts model to base GGUF (FP16, ~64GB)
✅ Quantizes to multiple levels:
- Q5_K_M (~20GB) - Best balance ⭐
- Q6_K (~24GB) - Higher quality
- Q4_K_M (~16GB) - Smaller size
- Q8_0 (~32GB) - Highest quality

Memory Requirements

Base conversion: ~64GB RAM (takes 30-60 min)
Quantization: ~32GB RAM (10-20 min per level)
Disk space: ~200GB recommended

Output Location

All GGUF files will be saved to:

/Users/jeanbapt/simple-llm-pro-finance/gguf_models/

Recommended Quantization for Mac

Based on your Mac's RAM:

Mac RAM	Recommended	Alternative
32GB	Q5_K_M	Q4_K_M
64GB+	Q6_K	Q8_0

Tool Calling Support

✅ GGUF models maintain full tool calling capabilities ✅ oLLama supports OpenAI-compatible function calling ✅ Works with your existing PydanticAI agents

Next Steps

Run the conversion (when ready - it takes time):
```
python3 scripts/convert_to_gguf.py
```

Create oLLama model (after conversion):

ollama create qwen-finance-32b -f Modelfile

Use with your agents - Update your endpoint config to point to local oLLama

Notes

The script uses HF_TOKEN_LC2 from your .env file automatically
llama.cpp is cloned to simple-llm-pro-finance/llama.cpp/
You can stop and resume - the script checks for existing files
Base FP16 file is created first, then quantizations run

Troubleshooting

If you encounter issues:

Out of memory: Use Q4_K_M instead
Conversion fails: Check HF token has access to model
Dependencies missing: Script auto-installs, but you can manually run:
```
pip install transformers torch sentencepiece protobuf gguf
```

Ready to convert! Run python3 scripts/convert_to_gguf.py when you're ready (it will take 30-60 minutes).