Qwen3-1.7B Magistral Math (GGUF)
TL;DR
This is a math-focused fine-tune of unsloth/Qwen3-1.7B-Base,
exported to GGUF (F16 / Q8_0 / Q4_K_M) with Unsloth.
Goal: small 1.7B model specialized for grade-school & early high-school math reasoning.
Data:
HAD653/GSM8K-OpenMath-MathReason-13k– 13.9k math word problems with structured chain-of-thought.Format: answers always follow the same pattern:
Problem: ... Reasoning: ... Answer: <final numeric answer>
- Best use: GSM8K-style problems, OpenMath-style word problems, step-by-step reasoning with a single numeric final answer.
Model Description
- Base model:
unsloth/Qwen3-1.7B-Base(Apache-2.0) - Architecture: Qwen3 dense causal LM, ~1.7B params, 28 layers, GQA attention, 32k context.
- Type: decoder-only LLM, text generation.
- This repo: inference-only GGUF weights for llama.cpp / LM Studio / Ollama / text-generation-webui.
Available files
From the Files tab:
Qwen3-1.7B-Magistral-Math-F16.gguf– highest quality, requires the most VRAM.Qwen3-1.7B-Magistral-Math-Q8_0.gguf– 8-bit quantization.Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf– 4-bit K-quant, best for smaller GPUs.
These files contain fine-tuned math weights, exported via
model.save_pretrained_ggufafter full BF16 training.
Training Data
This model is fine-tuned on:
Dataset:
HAD653/GSM8K-OpenMath-MathReason-13kSize: 13,857 examples.
Fields:
question: natural language math word problem.cot: structured solution with three blocks:Problem:Reasoning:Answer:
final_answer: canonical numeric answer (string).
The dataset focuses on easy–medium difficulty: basic arithmetic, fractions, percentages, rate problems, simple algebra, and simple combinatorics – the kind of tasks a 1–3B model can genuinely master.
Training Setup (Summary)
Fine-tuning was done with Unsloth + TRL on a single RTX 4090, using full BF16 fine-tuning (no LoRA).
Main hyperparameters:
Base:
unsloth/Qwen3-1.7B-BaseSequence length: 2048
Batching:
per_device_train_batch_size = 2,gradient_accumulation_steps = 8Effective batch size: ≈ 16 sequences
Epochs: 2
Optimizer / schedule:
learning_rate = 7e-5- linear scheduler,
warmup_ratio = 0.05 weight_decay = 0.01
Precision & memory:
dtype = bfloat16gradient_checkpointing = True
Supervision format
The training text for each sample is:
### Instruction:
{question}
### Response:
{cot}</s>
where </s> is the tokenizer EOS token.
Adding eos_token at the end of each sample teaches the model when to stop, which greatly reduces “Answer: 36 / Answer: 36 / …” loops during inference.
Prompting & Templates
Recommended system prompt (optional but useful)
You are a math reasoning assistant.
For every question, answer in exactly this format:
Problem:
<restate the problem in your own words>
Reasoning:
<step-by-step reasoning showing all intermediate steps>
Answer:
<final numeric answer only, on its own line>
Do not add any extra commentary before or after the answer.
Do not repeat the answer multiple times.
Stop after writing the final answer.
Inference template (matches training)
Single-turn format:
### Instruction:
{question}
### Response:
The model will then generate:
Problem:
...
Reasoning:
...
Answer:
<number>
Stop strings
On top of the EOS token, you can add stop strings in your UI:
### Instruction:### Response:
Many frontends (LM Studio, text-generation-webui, KoboldCpp, etc.) let you configure these so the model stops cleanly when it tries to start the next turn.
Quantization & Hardware Tips
The three variants in this repo roughly behave as follows (ballpark):
Q4_K_M(~1.1 GB) – best for:- 4–6 GB GPUs or pure CPU inference.
- Fast experimentation / local tools / “math assistant on a laptop”.
Q8_0(~1.8 GB) – good compromise:- 8–12 GB GPUs.
- Often slightly more stable than Q4 on harder problems.
F16(~3.5 GB) – highest fidelity:- 12+ GB GPUs (4090, 4080, 4070 12GB, A4000 etc.).
- Recommended if VRAM allows and you care about maximum accuracy.
As a rule of thumb, choose a file that is 1–2 GB smaller than your available VRAM.
Usage Examples
llama.cpp
Once you have built llama.cpp, you can run the model like this (replace with your path):
./llama-cli \
-m Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf \
-p "### Instruction:
Albert buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?
### Response:
" \
-n 256 \
--temp 0.1 \
--top-p 0.9 \
--repeat-penalty 1.05
Suggested decoding for math:
temperature: 0.0–0.2top_p: 0.9repeat_penalty: 1.05–1.1top_k: 20–40 (optional tweak)
LM Studio / other UIs
Set the prompt template to:
### Instruction:
{{prompt}}
### Response:
Add stop strings:
### Instruction:### Response:
and keep temperature low for math benchmarks.
Intended Uses & Limitations
Intended uses
- Solving GSM8K-style and OpenMath-style word problems.
- Training / evaluating small-scale math reasoning pipelines.
- Serving as a local math tutor for grade-school / early high-school algebra & arithmetic.
Limitations
- Not a general chat/instruction model; it is biased toward math.
- CoT is learned from synthetic teacher traces, not human-written solutions.
- Not suitable for high-stakes educational or decision-making without human oversight.
- Performance on very hard competition math (Olympiad-level, deep proofs) will be limited – the training data explicitly focuses on easy–medium difficulty.
Users are responsible for ensuring there is no data leakage if they evaluate on GSM8K/OpenMath-derived benchmarks.
Acknowledgements
- Base model: Qwen/Qwen3-1.7B-Base and the Qwen / Unsloth teams.
- Unsloth for fast fine-tuning and GGUF export.
- Training data:
HAD653/GSM8K-OpenMath-MathReason-13k.
Citation
If you use this model in your work, please cite:
@misc{had653_qwen3_magistral_math_gguf_2025,
author = {HAD653},
title = {Qwen3-1.7B Magistral Math (GGUF): A 1.7B Math Reasoning Model with Magistral Chain-of-Thought},
year = {2025},
howpublished = {\url{https://huggingface.co/HAD653/qwen3-1.7b-magistral-math-gguf}},
note = {Fine-tuned on GSM8K + OpenMath MathReason 13k, exported to GGUF (F16 / Q8\_0 / Q4\_K\_M).}
}
- Downloads last month
- 97
4-bit
8-bit
16-bit
Model tree for HAD653/qwen3-1.7b-magistral-math-gguf
Base model
Qwen/Qwen3-1.7B-Base