lukmanaj/smollm3-sft-colab-merged

smollm3-sft-colab-merged is a merged LoRA fine-tune of HuggingFaceTB/SmolLM3-3B-Base trained with SFT on HuggingFaceTB/smoltalk2_everyday_convs_think, then merged into a single checkpoint for easy inference.

  • Use case: conversational, reflective, everyday reasoning
  • Method: SFT + LoRA β†’ merged with peft’s merge_and_unload
  • Author: @lukmanaj

πŸš€ Quick start

from transformers import pipeline

question = "If you could instantly master any skill, what would it be and why?"
pipe = pipeline(
    "text-generation",
    model="lukmanaj/smollm3-sft-colab-merged",
    device_map="auto"
)

out = pipe(
    [{"role": "user", "content": question}],
    max_new_tokens=128,
    return_full_text=False,
    do_sample=True
)[0]["generated_text"]

print(out)

Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained.

🧩 Training summary

Base model: HuggingFaceTB/SmolLM3-3B-Base

Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think

Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged

Intended behavior: coherent, thoughtful conversational replies

Suggested hyperparameters (typical) Optimizer: AdamW

LR: 2e-5

Scheduler: linear decay

Batch size (effective): 8

Epochs: 3

LoRA: rank 8, alpha 16, dropout 0.05

πŸ”§ Reproduce the merge

The merged weights were produced with the following code:

Copy code
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "HuggingFaceTB/SmolLM3-3B-Base"
adapters = "lukmanaj/smollm3-sft-colab"

model = AutoModelForCausalLM.from_pretrained(
    base, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapters)
model = model.merge_and_unload()  # bake LoRA into the base

tok = AutoTokenizer.from_pretrained(base, use_fast=True)
model.save_pretrained("./smollm3-sft-merged", safe_serialization=True)
tok.save_pretrained("./smollm3-sft-merged")

🧠 Intended uses & limitations

Intended uses

  • Dialogue agents

  • Everyday reasoning / reflective Q&A

  • Creative writing prompts

Limitations

  • May hallucinate facts

  • Not aligned for safety-critical, medical, legal, or financial advice

  • Output may contain biases from training data

πŸ’» Framework versions

Library Version TRL 0.23.1 Transformers 4.57.0 PyTorch 2.6.0+cu124 Datasets 4.1.1 Tokenizers 0.22.1

πŸ“š Citations

TRL

@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}

❀️ Acknowledgements

Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.

Downloads last month
8
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lukmanaj/smollm3-sft-colab-merged

Finetuned
(69)
this model

Dataset used to train lukmanaj/smollm3-sft-colab-merged