lukmanaj/smollm3-sft-colab-merged
smollm3-sft-colab-merged is a merged LoRA fine-tune of HuggingFaceTB/SmolLM3-3B-Base trained with SFT on HuggingFaceTB/smoltalk2_everyday_convs_think, then merged into a single checkpoint for easy inference.
- Use case: conversational, reflective, everyday reasoning
- Method: SFT + LoRA β merged with
peftβsmerge_and_unload - Author: @lukmanaj
π Quick start
from transformers import pipeline
question = "If you could instantly master any skill, what would it be and why?"
pipe = pipeline(
"text-generation",
model="lukmanaj/smollm3-sft-colab-merged",
device_map="auto"
)
out = pipe(
[{"role": "user", "content": question}],
max_new_tokens=128,
return_full_text=False,
do_sample=True
)[0]["generated_text"]
print(out)
Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained.
π§© Training summary
Base model: HuggingFaceTB/SmolLM3-3B-Base
Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think
Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged
Intended behavior: coherent, thoughtful conversational replies
Suggested hyperparameters (typical) Optimizer: AdamW
LR: 2e-5
Scheduler: linear decay
Batch size (effective): 8
Epochs: 3
LoRA: rank 8, alpha 16, dropout 0.05
π§ Reproduce the merge
The merged weights were produced with the following code:
Copy code
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "HuggingFaceTB/SmolLM3-3B-Base"
adapters = "lukmanaj/smollm3-sft-colab"
model = AutoModelForCausalLM.from_pretrained(
base, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapters)
model = model.merge_and_unload() # bake LoRA into the base
tok = AutoTokenizer.from_pretrained(base, use_fast=True)
model.save_pretrained("./smollm3-sft-merged", safe_serialization=True)
tok.save_pretrained("./smollm3-sft-merged")
π§ Intended uses & limitations
Intended uses
Dialogue agents
Everyday reasoning / reflective Q&A
Creative writing prompts
Limitations
May hallucinate facts
Not aligned for safety-critical, medical, legal, or financial advice
Output may contain biases from training data
π» Framework versions
Library Version TRL 0.23.1 Transformers 4.57.0 PyTorch 2.6.0+cu124 Datasets 4.1.1 Tokenizers 0.22.1
π Citations
TRL
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
β€οΈ Acknowledgements
Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.
- Downloads last month
- 8
Model tree for lukmanaj/smollm3-sft-colab-merged
Base model
HuggingFaceTB/SmolLM3-3B-Base