lukmanaj commited on
Commit
bc28ccb
·
verified ·
1 Parent(s): f93cb85

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceTB/smoltalk2_everyday_convs_think
5
+ language:
6
+ - en
7
+ base_model:
8
+ - HuggingFaceTB/SmolLM3-3B-Base
9
+ ---
10
+ # lukmanaj/smollm3-sft-colab-merged
11
+
12
+ **smollm3-sft-colab-merged** is a merged LoRA fine-tune of **[`HuggingFaceTB/SmolLM3-3B-Base`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)** trained with SFT on **[`HuggingFaceTB/smoltalk2_everyday_convs_think`](https://huggingface.co/datasets/HuggingFaceTB/smoltalk2_everyday_convs_think)**, then merged into a single checkpoint for easy inference.
13
+
14
+ - **Use case:** conversational, reflective, everyday reasoning
15
+ - **Method:** SFT + LoRA → merged with `peft`’s `merge_and_unload`
16
+ - **Author:** [@lukmanaj](https://huggingface.co/lukmanaj)
17
+
18
+ ---
19
+
20
+ ## 🚀 Quick start
21
+
22
+ ```python
23
+ from transformers import pipeline
24
+
25
+ question = "If you could instantly master any skill, what would it be and why?"
26
+ pipe = pipeline(
27
+ "text-generation",
28
+ model="lukmanaj/smollm3-sft-colab-merged",
29
+ device_map="auto"
30
+ )
31
+
32
+ out = pipe(
33
+ [{"role": "user", "content": question}],
34
+ max_new_tokens=128,
35
+ return_full_text=False,
36
+ do_sample=True
37
+ )[0]["generated_text"]
38
+
39
+ print(out)
40
+ ```
41
+ > Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained.
42
+
43
+ ## 🧩 Training summary
44
+ Base model: HuggingFaceTB/SmolLM3-3B-Base
45
+
46
+ Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think
47
+
48
+ Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged
49
+
50
+ Intended behavior: coherent, thoughtful conversational replies
51
+
52
+ Suggested hyperparameters (typical)
53
+ Optimizer: AdamW
54
+
55
+ LR: 2e-5
56
+
57
+ Scheduler: linear decay
58
+
59
+ Batch size (effective): 8
60
+
61
+ Epochs: 3
62
+
63
+ LoRA: rank 8, alpha 16, dropout 0.05
64
+
65
+ ## 🔧 Reproduce the merge
66
+ The merged weights were produced with the following code:
67
+
68
+ ```python
69
+ Copy code
70
+ from transformers import AutoModelForCausalLM, AutoTokenizer
71
+ from peft import PeftModel
72
+ import torch
73
+
74
+ base = "HuggingFaceTB/SmolLM3-3B-Base"
75
+ adapters = "lukmanaj/smollm3-sft-colab"
76
+
77
+ model = AutoModelForCausalLM.from_pretrained(
78
+ base, torch_dtype=torch.bfloat16, device_map="auto"
79
+ )
80
+ model = PeftModel.from_pretrained(model, adapters)
81
+ model = model.merge_and_unload() # bake LoRA into the base
82
+
83
+ tok = AutoTokenizer.from_pretrained(base, use_fast=True)
84
+ model.save_pretrained("./smollm3-sft-merged", safe_serialization=True)
85
+ tok.save_pretrained("./smollm3-sft-merged")
86
+ ```
87
+
88
+ ## 🧠 Intended uses & limitations
89
+ Intended uses
90
+
91
+ - Dialogue agents
92
+
93
+ - Everyday reasoning / reflective Q&A
94
+
95
+ - Creative writing prompts
96
+
97
+ ## Limitations
98
+
99
+ - May hallucinate facts
100
+
101
+ - Not aligned for safety-critical, medical, legal, or financial advice
102
+
103
+ - Output may contain biases from training data
104
+
105
+ ## 💻 Framework versions
106
+ Library Version
107
+ TRL 0.23.1
108
+ Transformers 4.57.0
109
+ PyTorch 2.6.0+cu124
110
+ Datasets 4.1.1
111
+ Tokenizers 0.22.1
112
+
113
+ ## 📚 Citations
114
+ TRL
115
+
116
+ ```bibtex
117
+ @misc{vonwerra2022trl,
118
+ title = {{TRL: Transformer Reinforcement Learning}},
119
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
120
+ year = 2020,
121
+ journal = {GitHub repository},
122
+ publisher = {GitHub},
123
+ howpublished = {\url{https://github.com/huggingface/trl}}
124
+ }
125
+ ```
126
+
127
+ ## ❤️ Acknowledgements
128
+ Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.