Model Card for Llama-3.2-3B-it-Medical-LoRA

This model is a fine-tuned version of unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit. It has been trained using TRL.

Training procedure

This model was trained with SFT.

Usage

HuggingFace Authentication

import os
from huggingface_hub import login

# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"

# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

# Define model and LoRA adapter paths
base_model_name = "meta-llama/Llama-3.2-3B-Instruct"
lora_adapter_name = "danhtran2mind/Llama-3.2-3B-Instruct-Vi-Medical-LoRA"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    device_map=device,
    trust_remote_code=True
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)

# Set model to evaluation mode
model.eval()

instruction = '''Bạn là một trợ lý hữu ích được giao nhiệm vụ trích xuất các đoạn văn trả lời câu hỏi của người dùng từ một ngữ cảnh cho trước. Xuất ra các đoạn văn chính xác từng từ một trả lời câu hỏi của người dùng. Không xuất ra bất kỳ văn bản nào khác ngoài các đoạn văn trong ngữ cảnh. Xuất ra lượng tối thiểu để trả lời câu hỏi, ví dụ chỉ 2-3 từ từ đoạn văn. Nếu không thể tìm thấy câu trả lời trong ngữ cảnh, xuất ra 'Ngữ cảnh không cung cấp câu trả lời...'
'''
question = "Tôi bị viêm loét dạ dày, tôi nên đến chuyên khoa nào để thăm khám?"

# Set random seed for reproducibility
seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

# Create conversation messages
messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": question},
]

text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False  # Ensure the output is a string
)

# Tokenize the input and move to device
inputs = tokenizer(text, return_tensors="pt").to(device)

# Generate response with TextStreamer
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
_ = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.95,
    top_k=64,
    streamer=streamer
)

Bạn nên đến chuyên khoa tiêu hóa để thăm khám.

Framework versions

PEFT 0.14.0
TRL: 0.19.0
Transformers: 4.51.3
Pytorch: 2.7.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danhtran2mind/Llama-3.2-3B-Instruct-Vi-Medical-LoRA

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(737)

this model

Dataset used to train danhtran2mind/Llama-3.2-3B-Instruct-Vi-Medical-LoRA

Space using danhtran2mind/Llama-3.2-3B-Instruct-Vi-Medical-LoRA 1

Collection including danhtran2mind/Llama-3.2-3B-Instruct-Vi-Medical-LoRA

DanhTran2Mind's LLMs

Collection

DanhTran2Mind's fine-tuned LLMs use LoRA for efficiency or full fine-tuning for top performance, customized to each model hub and task. • 8 items • Updated Jul 22, 2025