lukmanaj
/

smollm3-sft-colab-merged

+---
+license: apache-2.0
+datasets:
+- HuggingFaceTB/smoltalk2_everyday_convs_think
+language:
+- en
+base_model:
+- HuggingFaceTB/SmolLM3-3B-Base
+---
+# lukmanaj/smollm3-sft-colab-merged
+**smollm3-sft-colab-merged** is a merged LoRA fine-tune of **[`HuggingFaceTB/SmolLM3-3B-Base`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)** trained with SFT on **[`HuggingFaceTB/smoltalk2_everyday_convs_think`](https://huggingface.co/datasets/HuggingFaceTB/smoltalk2_everyday_convs_think)**, then merged into a single checkpoint for easy inference.
+- **Use case:** conversational, reflective, everyday reasoning
+- **Method:** SFT + LoRA → merged with `peft`’s `merge_and_unload`
+- **Author:** [@lukmanaj](https://huggingface.co/lukmanaj)
+---
+## 🚀 Quick start
+```python
+from transformers import pipeline
+question = "If you could instantly master any skill, what would it be and why?"
+pipe = pipeline(
+    "text-generation",
+    model="lukmanaj/smollm3-sft-colab-merged",
+    device_map="auto"
+)
+out = pipe(
+    [{"role": "user", "content": question}],
+    max_new_tokens=128,
+    return_full_text=False,
+    do_sample=True
+)[0]["generated_text"]
+print(out)
+```
+> Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained.
+## 🧩 Training summary
+Base model: HuggingFaceTB/SmolLM3-3B-Base
+Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think
+Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged
+Intended behavior: coherent, thoughtful conversational replies
+Suggested hyperparameters (typical)
+Optimizer: AdamW
+LR: 2e-5
+Scheduler: linear decay
+Batch size (effective): 8
+Epochs: 3
+LoRA: rank 8, alpha 16, dropout 0.05
+## 🔧 Reproduce the merge
+The merged weights were produced with the following code:
+```python
+Copy code
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base = "HuggingFaceTB/SmolLM3-3B-Base"
+adapters = "lukmanaj/smollm3-sft-colab"
+model = AutoModelForCausalLM.from_pretrained(
+    base, torch_dtype=torch.bfloat16, device_map="auto"
+)
+model = PeftModel.from_pretrained(model, adapters)
+model = model.merge_and_unload()  # bake LoRA into the base
+tok = AutoTokenizer.from_pretrained(base, use_fast=True)
+model.save_pretrained("./smollm3-sft-merged", safe_serialization=True)
+tok.save_pretrained("./smollm3-sft-merged")
+```
+## 🧠 Intended uses & limitations
+Intended uses
+- Dialogue agents
+- Everyday reasoning / reflective Q&A
+- Creative writing prompts
+## Limitations
+- May hallucinate facts
+- Not aligned for safety-critical, medical, legal, or financial advice
+- Output may contain biases from training data
+## 💻 Framework versions
+Library	Version
+TRL	0.23.1
+Transformers	4.57.0
+PyTorch	2.6.0+cu124
+Datasets	4.1.1
+Tokenizers	0.22.1
+## 📚 Citations
+TRL
+```bibtex
+@misc{vonwerra2022trl,
+  title        = {{TRL: Transformer Reinforcement Learning}},
+  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+  year         = 2020,
+  journal      = {GitHub repository},
+  publisher    = {GitHub},
+  howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```
+## ❤️ Acknowledgements
+Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.