jeanbaptdzd commited on
Commit
e3724fa
Β·
1 Parent(s): 659e232

Add GGUF conversion script for DragonLLM 32B models

Browse files

- Add convert_to_gguf.py script to convert HF models to GGUF format
- Support for multiple 32B models (Qwen-Pro-Finance-R-32B, etc.)
- Automatic quantization to Q4_K_M, Q5_K_M, Q6_K, Q8_0
- Auto-install llama.cpp and dependencies
- Documentation with usage instructions and memory requirements
- Ready for oLLama integration with tool calling support

scripts/GGUF_CONVERSION_SUMMARY.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GGUF Conversion Setup Complete βœ…
2
+
3
+ ## What Was Created
4
+
5
+ 1. **`scripts/convert_to_gguf.py`** - Main conversion script
6
+ 2. **`scripts/README_GGUF.md`** - Detailed usage instructions
7
+ 3. **Dependencies installed** - transformers, torch, sentencepiece, etc.
8
+
9
+ ## Quick Start
10
+
11
+ ```bash
12
+ cd /Users/jeanbapt/simple-llm-pro-finance
13
+ source venv/bin/activate
14
+
15
+ # Convert default model (Qwen-Pro-Finance-R-32B)
16
+ python3 scripts/convert_to_gguf.py
17
+
18
+ # Or specify a different 32B model
19
+ python3 scripts/convert_to_gguf.py 2 # qwen3-32b-fin-v1.0
20
+ ```
21
+
22
+ ## Available 32B Models
23
+
24
+ The script found these 32B models in DragonLLM:
25
+
26
+ 1. **DragonLLM/Qwen-Pro-Finance-R-32B** ⭐ (Recommended - latest)
27
+ 2. DragonLLM/qwen3-32b-fin-v1.0
28
+ 3. DragonLLM/qwen3-32b-fin-v0.3
29
+ 4. DragonLLM/qwen3-32b-fin-v1.0-fp8 (Pre-quantized)
30
+ 5. DragonLLM/Qwen-Pro-Finance-R-32B-FP8 (Pre-quantized)
31
+
32
+ ## What the Script Does
33
+
34
+ 1. βœ… Checks for llama.cpp (clones if needed)
35
+ 2. βœ… Installs required Python dependencies
36
+ 3. βœ… Converts model to base GGUF (FP16, ~64GB)
37
+ 4. βœ… Quantizes to multiple levels:
38
+ - **Q5_K_M** (~20GB) - **Best balance** ⭐
39
+ - Q6_K (~24GB) - Higher quality
40
+ - Q4_K_M (~16GB) - Smaller size
41
+ - Q8_0 (~32GB) - Highest quality
42
+
43
+ ## Memory Requirements
44
+
45
+ - **Base conversion**: ~64GB RAM (takes 30-60 min)
46
+ - **Quantization**: ~32GB RAM (10-20 min per level)
47
+ - **Disk space**: ~200GB recommended
48
+
49
+ ## Output Location
50
+
51
+ All GGUF files will be saved to:
52
+ ```
53
+ /Users/jeanbapt/simple-llm-pro-finance/gguf_models/
54
+ ```
55
+
56
+ ## Recommended Quantization for Mac
57
+
58
+ Based on your Mac's RAM:
59
+
60
+ | Mac RAM | Recommended | Alternative |
61
+ |---------|-------------|------------|
62
+ | 32GB | Q5_K_M | Q4_K_M |
63
+ | 64GB+ | Q6_K | Q8_0 |
64
+
65
+ ## Tool Calling Support
66
+
67
+ βœ… GGUF models maintain full tool calling capabilities
68
+ βœ… oLLama supports OpenAI-compatible function calling
69
+ βœ… Works with your existing PydanticAI agents
70
+
71
+ ## Next Steps
72
+
73
+ 1. **Run the conversion** (when ready - it takes time):
74
+ ```bash
75
+ python3 scripts/convert_to_gguf.py
76
+ ```
77
+
78
+ 2. **Create oLLama model** (after conversion):
79
+ ```bash
80
+ ollama create qwen-finance-32b -f Modelfile
81
+ ```
82
+
83
+ 3. **Use with your agents** - Update your endpoint config to point to local oLLama
84
+
85
+ ## Notes
86
+
87
+ - The script uses `HF_TOKEN_LC2` from your `.env` file automatically
88
+ - llama.cpp is cloned to `simple-llm-pro-finance/llama.cpp/`
89
+ - You can stop and resume - the script checks for existing files
90
+ - Base FP16 file is created first, then quantizations run
91
+
92
+ ## Troubleshooting
93
+
94
+ If you encounter issues:
95
+
96
+ 1. **Out of memory**: Use Q4_K_M instead
97
+ 2. **Conversion fails**: Check HF token has access to model
98
+ 3. **Dependencies missing**: Script auto-installs, but you can manually run:
99
+ ```bash
100
+ pip install transformers torch sentencepiece protobuf gguf
101
+ ```
102
+
103
+ ---
104
+
105
+ **Ready to convert!** Run `python3 scripts/convert_to_gguf.py` when you're ready (it will take 30-60 minutes).
106
+
scripts/README_GGUF.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GGUF Conversion Script
2
+
3
+ This script converts DragonLLM models from Hugging Face to GGUF format for use with oLLama on Mac.
4
+
5
+ ## Quick Start
6
+
7
+ ```bash
8
+ # Activate virtual environment
9
+ cd /Users/jeanbapt/simple-llm-pro-finance
10
+ source venv/bin/activate
11
+
12
+ # Run conversion (uses default: Qwen-Pro-Finance-R-32B)
13
+ python3 scripts/convert_to_gguf.py
14
+
15
+ # Or specify a model by number (1-5) or name
16
+ python3 scripts/convert_to_gguf.py 1 # Qwen-Pro-Finance-R-32B
17
+ python3 scripts/convert_to_gguf.py 2 # qwen3-32b-fin-v1.0
18
+ python3 scripts/convert_to_gguf.py "DragonLLM/qwen3-32b-fin-v1.0"
19
+ ```
20
+
21
+ ## Available 32B Models
22
+
23
+ 1. **DragonLLM/Qwen-Pro-Finance-R-32B** (Recommended - latest)
24
+ 2. DragonLLM/qwen3-32b-fin-v1.0
25
+ 3. DragonLLM/qwen3-32b-fin-v0.3
26
+ 4. DragonLLM/qwen3-32b-fin-v1.0-fp8 (Already quantized to FP8)
27
+ 5. DragonLLM/Qwen-Pro-Finance-R-32B-FP8 (Already quantized to FP8)
28
+
29
+ ## What It Does
30
+
31
+ 1. **Downloads llama.cpp** (if not already present)
32
+ 2. **Converts model to base GGUF** (FP16, ~64GB)
33
+ 3. **Quantizes to multiple levels**:
34
+ - Q5_K_M (~20GB) - **Best balance** ⭐
35
+ - Q6_K (~24GB) - Higher quality
36
+ - Q4_K_M (~16GB) - Smaller size
37
+ - Q8_0 (~32GB) - Highest quality
38
+
39
+ ## Memory Requirements
40
+
41
+ - **Base conversion (FP16)**: ~64GB RAM
42
+ - **Quantization**: ~32GB RAM (can be done separately)
43
+
44
+ ## Output
45
+
46
+ Files are saved to: `simple-llm-pro-finance/gguf_models/`
47
+
48
+ ```
49
+ gguf_models/
50
+ β”œβ”€β”€ Qwen-Pro-Finance-R-32B-f16.gguf (~64GB)
51
+ β”œβ”€β”€ Qwen-Pro-Finance-R-32B-q5_k_m.gguf (~20GB) ⭐ Recommended
52
+ β”œβ”€β”€ Qwen-Pro-Finance-R-32B-q6_k.gguf (~24GB)
53
+ β”œβ”€β”€ Qwen-Pro-Finance-R-32B-q4_k_m.gguf (~16GB)
54
+ └── Qwen-Pro-Finance-R-32B-q8_0.gguf (~32GB)
55
+ ```
56
+
57
+ ## Using with oLLama
58
+
59
+ After conversion, create an oLLama model:
60
+
61
+ ```bash
62
+ # Create Modelfile
63
+ cat > Modelfile << EOF
64
+ FROM ./gguf_models/Qwen-Pro-Finance-R-32B-q5_k_m.gguf
65
+ TEMPLATE """{{ if .System }}<|im_start|>system
66
+ {{ .System }}<|im_end|>
67
+ {{ end }}{{ if .Prompt }}<|im_start|>user
68
+ {{ .Prompt }}<|im_end|>
69
+ {{ end }}<|im_start|>assistant
70
+ {{ .Response }}<|im_end|>
71
+ """
72
+ PARAMETER num_ctx 8192
73
+ PARAMETER temperature 0.7
74
+ EOF
75
+
76
+ # Create model
77
+ ollama create qwen-finance-32b -f Modelfile
78
+
79
+ # Use it
80
+ ollama run qwen-finance-32b "What is compound interest?"
81
+ ```
82
+
83
+ ## Tool Calling Support
84
+
85
+ GGUF models maintain tool calling capabilities. oLLama supports OpenAI-compatible function calling:
86
+
87
+ ```python
88
+ from openai import OpenAI
89
+
90
+ client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
91
+
92
+ response = client.chat.completions.create(
93
+ model="qwen-finance-32b",
94
+ messages=[{"role": "user", "content": "Calculate future value of 10000 at 5% for 10 years"}],
95
+ tools=[{
96
+ "type": "function",
97
+ "function": {
98
+ "name": "calculate_fv",
99
+ "description": "Calculate future value",
100
+ "parameters": {
101
+ "type": "object",
102
+ "properties": {
103
+ "pv": {"type": "number"},
104
+ "rate": {"type": "number"},
105
+ "nper": {"type": "number"}
106
+ }
107
+ }
108
+ }
109
+ }],
110
+ tool_choice="auto"
111
+ )
112
+ ```
113
+
114
+ ## Troubleshooting
115
+
116
+ ### Out of Memory
117
+ - Use Q4_K_M instead of Q5_K_M
118
+ - Close other applications
119
+ - Reduce context window in oLLama (`num_ctx 4096`)
120
+
121
+ ### Conversion Fails
122
+ - Ensure HF_TOKEN_LC2 is set in .env
123
+ - Check you have access to the model on Hugging Face
124
+ - Verify you have enough disk space (~200GB recommended)
125
+
126
+ ### Quantization Fails
127
+ - The base FP16 file is still usable
128
+ - Try quantizing manually: `./llama.cpp/llama-quantize input.gguf output.gguf Q5_K_M`
129
+
130
+ ## Notes
131
+
132
+ - **FP8 models** (models 4 and 5) are already quantized, but converting to GGUF still provides benefits for oLLama
133
+ - **Q5_K_M is recommended** for best quality/size trade-off on Mac
134
+ - Conversion takes 30-60 minutes depending on your system
135
+ - Quantization takes 10-20 minutes per level
136
+
scripts/convert_to_gguf.py ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Convert DragonLLM models from Hugging Face to GGUF format.
4
+
5
+ This script:
6
+ 1. Downloads the model from Hugging Face
7
+ 2. Converts it to GGUF format using llama.cpp
8
+ 3. Quantizes to multiple levels (Q4_K_M, Q5_K_M, Q6_K, Q8_0)
9
+
10
+ Requirements:
11
+ - llama.cpp installed (git clone https://github.com/ggerganov/llama.cpp.git)
12
+ - Python packages: huggingface_hub, python-dotenv
13
+ """
14
+
15
+ import os
16
+ import sys
17
+ import subprocess
18
+ import shutil
19
+ from pathlib import Path
20
+ from typing import Optional
21
+ from dotenv import load_dotenv
22
+
23
+ # Load environment variables
24
+ ENV_FILE = Path(__file__).parent.parent / ".env"
25
+ if ENV_FILE.exists():
26
+ load_dotenv(ENV_FILE)
27
+
28
+ HF_TOKEN = os.getenv("HF_TOKEN_LC2") or os.getenv("HF_TOKEN") or os.getenv("HUGGING_FACE_HUB_TOKEN")
29
+
30
+ # Available 32B models found
31
+ AVAILABLE_32B_MODELS = [
32
+ "DragonLLM/Qwen-Pro-Finance-R-32B",
33
+ "DragonLLM/qwen3-32b-fin-v1.0",
34
+ "DragonLLM/qwen3-32b-fin-v0.3",
35
+ "DragonLLM/qwen3-32b-fin-v1.0-fp8",
36
+ "DragonLLM/Qwen-Pro-Finance-R-32B-FP8",
37
+ ]
38
+
39
+ # Quantization levels (best trade-off first)
40
+ QUANTIZATIONS = [
41
+ ("Q5_K_M", "~20GB", "Best balance of quality and size"),
42
+ ("Q6_K", "~24GB", "Higher quality"),
43
+ ("Q4_K_M", "~16GB", "Smaller size, good quality"),
44
+ ("Q8_0", "~32GB", "Highest quality, larger size"),
45
+ ]
46
+
47
+
48
+ def check_llama_cpp() -> Optional[Path]:
49
+ """Check if llama.cpp is available."""
50
+ # Check common locations
51
+ possible_paths = [
52
+ Path.home() / "llama.cpp",
53
+ Path(__file__).parent.parent / "llama.cpp",
54
+ Path("/usr/local/llama.cpp"),
55
+ ]
56
+
57
+ for path in possible_paths:
58
+ # Try both naming conventions
59
+ convert_script = path / "convert_hf_to_gguf.py"
60
+ if not convert_script.exists():
61
+ convert_script = path / "convert-hf-to-gguf.py"
62
+ quantize_bin = path / "llama-quantize"
63
+ if convert_script.exists() and (quantize_bin.exists() or (path / "llama-quantize.exe").exists()):
64
+ return path
65
+
66
+ return None
67
+
68
+
69
+ def install_llama_cpp(target_dir: Path) -> Path:
70
+ """Clone and set up llama.cpp."""
71
+ print(f"πŸ“¦ Cloning llama.cpp to {target_dir}...")
72
+
73
+ if target_dir.exists():
74
+ print(f" {target_dir} already exists, using existing installation")
75
+ return target_dir
76
+
77
+ try:
78
+ subprocess.run(
79
+ ["git", "clone", "https://github.com/ggerganov/llama.cpp.git", str(target_dir)],
80
+ check=True,
81
+ capture_output=True,
82
+ )
83
+ print("βœ… llama.cpp cloned successfully")
84
+
85
+ # Install Python requirements for conversion
86
+ requirements = target_dir / "requirements" / "requirements-convert_hf_to_gguf.txt"
87
+ if not requirements.exists():
88
+ requirements = target_dir / "requirements.txt"
89
+ if requirements.exists():
90
+ print("πŸ“¦ Installing Python requirements for llama.cpp conversion...")
91
+ subprocess.run(
92
+ [sys.executable, "-m", "pip", "install", "-r", str(requirements), "--quiet"],
93
+ check=False, # Don't fail if some packages are already installed
94
+ )
95
+
96
+ # Try to build (optional, but faster)
97
+ print("πŸ”¨ Attempting to build llama-quantize (optional)...")
98
+ try:
99
+ subprocess.run(["make", "-C", str(target_dir)], check=True, capture_output=True)
100
+ print("βœ… Build successful")
101
+ except (subprocess.CalledProcessError, FileNotFoundError):
102
+ print("⚠️ Build failed or make not available, will use Python quantize")
103
+
104
+ return target_dir
105
+ except subprocess.CalledProcessError as e:
106
+ print(f"❌ Error cloning llama.cpp: {e}")
107
+ sys.exit(1)
108
+
109
+
110
+ def convert_to_gguf(
111
+ model_name: str,
112
+ output_dir: Path,
113
+ llama_cpp_dir: Path,
114
+ hf_token: str,
115
+ ) -> Path:
116
+ """Convert Hugging Face model to GGUF format."""
117
+ output_dir.mkdir(parents=True, exist_ok=True)
118
+
119
+ base_name = model_name.split("/")[-1].replace(".", "-")
120
+ output_file = output_dir / f"{base_name}-f16.gguf"
121
+
122
+ if output_file.exists():
123
+ print(f"βœ… Base GGUF file already exists: {output_file}")
124
+ return output_file
125
+
126
+ print(f"πŸ”„ Converting {model_name} to GGUF (FP16)...")
127
+ print(f" This may take 30-60 minutes and requires ~64GB RAM...")
128
+
129
+ # Try both naming conventions
130
+ convert_script = llama_cpp_dir / "convert_hf_to_gguf.py"
131
+ if not convert_script.exists():
132
+ convert_script = llama_cpp_dir / "convert-hf-to-gguf.py"
133
+
134
+ try:
135
+ subprocess.run(
136
+ [
137
+ sys.executable,
138
+ str(convert_script),
139
+ "--outdir", str(output_dir),
140
+ "--outfile", output_file.name,
141
+ model_name,
142
+ "--token", hf_token,
143
+ ],
144
+ check=True,
145
+ )
146
+ print(f"βœ… Conversion complete: {output_file}")
147
+ return output_file
148
+ except subprocess.CalledProcessError as e:
149
+ print(f"❌ Conversion failed: {e}")
150
+ sys.exit(1)
151
+
152
+
153
+ def quantize_gguf(
154
+ input_file: Path,
155
+ output_dir: Path,
156
+ llama_cpp_dir: Path,
157
+ quantizations: list,
158
+ ) -> list[Path]:
159
+ """Quantize GGUF file to different levels."""
160
+ quantized_files = []
161
+
162
+ # Try binary quantize first, fallback to Python
163
+ quantize_bin = llama_cpp_dir / "llama-quantize"
164
+ if not quantize_bin.exists():
165
+ quantize_bin = llama_cpp_dir / "llama-quantize.exe"
166
+
167
+ use_binary = quantize_bin.exists()
168
+
169
+ if not use_binary:
170
+ print("⚠️ Binary quantize not found, will use Python quantize (slower)")
171
+ quantize_script = llama_cpp_dir / "quantize.py"
172
+ if not quantize_script.exists():
173
+ print("❌ No quantize tool found!")
174
+ return []
175
+
176
+ for qtype, size, description in quantizations:
177
+ output_file = output_dir / input_file.name.replace("-f16.gguf", f"-{qtype.lower()}.gguf")
178
+
179
+ if output_file.exists():
180
+ print(f"βœ… {qtype} already exists: {output_file}")
181
+ quantized_files.append(output_file)
182
+ continue
183
+
184
+ print(f"πŸ”„ Quantizing to {qtype} ({size}, {description})...")
185
+
186
+ try:
187
+ if use_binary:
188
+ subprocess.run(
189
+ [str(quantize_bin), str(input_file), str(output_file), qtype],
190
+ check=True,
191
+ )
192
+ else:
193
+ subprocess.run(
194
+ [
195
+ sys.executable,
196
+ str(quantize_script),
197
+ str(input_file),
198
+ str(output_file),
199
+ qtype,
200
+ ],
201
+ check=True,
202
+ )
203
+ print(f"βœ… {qtype} complete: {output_file}")
204
+ quantized_files.append(output_file)
205
+ except subprocess.CalledProcessError as e:
206
+ print(f"⚠️ Quantization to {qtype} failed: {e}")
207
+ continue
208
+
209
+ return quantized_files
210
+
211
+
212
+ def main():
213
+ """Main conversion script."""
214
+ if not HF_TOKEN:
215
+ print("❌ Error: HF_TOKEN_LC2 not found in environment")
216
+ print(" Please set it in .env file or environment variables")
217
+ sys.exit(1)
218
+
219
+ # Select model
220
+ print("Available 32B models:")
221
+ for i, model in enumerate(AVAILABLE_32B_MODELS, 1):
222
+ print(f" {i}. {model}")
223
+
224
+ if len(sys.argv) > 1:
225
+ try:
226
+ model_idx = int(sys.argv[1]) - 1
227
+ if 0 <= model_idx < len(AVAILABLE_32B_MODELS):
228
+ model_name = AVAILABLE_32B_MODELS[model_idx]
229
+ else:
230
+ model_name = sys.argv[1] # Use as model name directly
231
+ except ValueError:
232
+ model_name = sys.argv[1] # Use as model name directly
233
+ else:
234
+ # Default to best model
235
+ model_name = AVAILABLE_32B_MODELS[0]
236
+ print(f"\nUsing default model: {model_name}")
237
+ print(" (Pass model number or name as argument to use different model)")
238
+
239
+ print(f"\n🎯 Target model: {model_name}")
240
+
241
+ # Setup directories
242
+ script_dir = Path(__file__).parent.parent
243
+ output_dir = script_dir / "gguf_models"
244
+ llama_cpp_dir = script_dir / "llama.cpp"
245
+
246
+ # Check/install llama.cpp
247
+ llama_cpp_path = check_llama_cpp()
248
+ if not llama_cpp_path:
249
+ print("πŸ“¦ llama.cpp not found, installing...")
250
+ llama_cpp_path = install_llama_cpp(llama_cpp_dir)
251
+ else:
252
+ print(f"βœ… Found llama.cpp at: {llama_cpp_path}")
253
+
254
+ # Convert to GGUF
255
+ base_gguf = convert_to_gguf(model_name, output_dir, llama_cpp_path, HF_TOKEN)
256
+
257
+ # Quantize
258
+ print(f"\nπŸ“Š Quantizing to multiple levels...")
259
+ quantized = quantize_gguf(base_gguf, output_dir, llama_cpp_path, QUANTIZATIONS)
260
+
261
+ # Summary
262
+ print(f"\nβœ… Conversion complete!")
263
+ print(f"\nπŸ“ Output directory: {output_dir}")
264
+ print(f"\nπŸ“¦ Generated files:")
265
+ print(f" - {base_gguf.name} ({base_gguf.stat().st_size / (1024**3):.1f} GB)")
266
+ for qfile in quantized:
267
+ size_gb = qfile.stat().st_size / (1024**3)
268
+ print(f" - {qfile.name} ({size_gb:.1f} GB)")
269
+
270
+ print(f"\nπŸ’‘ Recommended for Mac:")
271
+ print(f" - 32GB RAM: Use Q5_K_M or Q4_K_M")
272
+ print(f" - 64GB+ RAM: Use Q6_K or Q8_0")
273
+ print(f"\nπŸš€ To use with oLLama:")
274
+ print(f" ollama create {model_name.split('/')[-1].lower()} -f <(echo 'FROM {quantized[0] if quantized else base_gguf}')")
275
+
276
+
277
+ if __name__ == "__main__":
278
+ main()
279
+