Qwen3-4B-Text-Summarizer-finetuning
Model Description
Qwen3-4B-Text-Summarizer-finetuning is a highly efficient, fine-tuned version of the powerful Qwen/Qwen3-4B (4 Billion Parameter) Large Language Model. It has been specifically engineered to handle the complex task of Abstractive Dialogue Summarization.
Unlike standard summarizers that struggle with informal language, slang, and speaker transitions, this model leverages the advanced reasoning capabilities of the Qwen3 architecture, adapted via Rank-Stabilized LoRA (RSLoRA) to produce concise, factual, and coherent summaries of conversations.
Key Features
- Advanced Base: Built on Qwen3-4B, utilizing its "Thinking" capabilities for better context understanding.
- SOTA Adaptation: Trained using Rank 64 (r=64) LoRA adapters targeting all linear layers (k, q, v, o, gate, up, down), ensuring maximum plasticity and performance.
- Dialogue Specialized: Fine-tuned on the SAMSum dataset, making it an expert at summarizing real-world, messy human conversations.
- Efficient: 4-bit Quantized (QLoRA) for low-VRAM inference while maintaining 16-bit performance levels.
Technical Specifications
| Feature | Specification |
|---|---|
| Base Model | Qwen/Qwen3-4B-Instruct |
| Architecture | Causal Decoder-Only Transformer |
| Quantization | 4-bit (BitsAndBytes) |
| Adapter Rank | 64 (High Capacity) |
| Alpha | 32 (Rank Stabilized) |
| Target Modules | All Linear Layers |
| Training Framework | Unsloth + TRL + PEFT |
How to Use
1. Fast Inference (Recommended with Unsloth)
For 2x faster inference, use the unsloth library.
from unsloth import FastLanguageModel
import torch
repo_id = "riturajpandey739/Qwen3-4B-Text-Summarizer-finetuning"
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = repo_id,
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
# Define Prompt Template
prompt_template = """Below is a conversation between people. Write a concise summary of the conversation.
### Dialogue:
{}
### Summary:
"""
# Inference
dialogue = """
Rituraj: The model is working perfectly on Colab.
Scientist: That is great news. Did you push it to the hub?
Rituraj: Yes, I just updated the README as well.
"""
inputs = tokenizer([prompt_template.format(dialogue)], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, temperature = 0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 44
