Instructions to use arunvpp05/Nexura-Gemma2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use arunvpp05/Nexura-Gemma2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="arunvpp05/Nexura-Gemma2B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("arunvpp05/Nexura-Gemma2B") model = AutoModelForCausalLM.from_pretrained("arunvpp05/Nexura-Gemma2B") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use arunvpp05/Nexura-Gemma2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "arunvpp05/Nexura-Gemma2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arunvpp05/Nexura-Gemma2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/arunvpp05/Nexura-Gemma2B
- SGLang
How to use arunvpp05/Nexura-Gemma2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "arunvpp05/Nexura-Gemma2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arunvpp05/Nexura-Gemma2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "arunvpp05/Nexura-Gemma2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arunvpp05/Nexura-Gemma2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use arunvpp05/Nexura-Gemma2B with Docker Model Runner:
docker model run hf.co/arunvpp05/Nexura-Gemma2B
🔷 Nexura-Gemma-2B
A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model
Nexura-Gemma-2B is a custom fine-tuned variant of Google’s Gemma-2B model.
It is trained in two stages:
- SFT (Supervised Fine-Tuning) using high-quality instruction datasets
- DPO (Direct Preference Optimization) for preference alignment
The model follows a strict XML-style instruction format, exactly matching the SFT training data:
<user>
{instruction}
</user>
<assistant>
{response}
📌 1. Base Model
- Base:
google/gemma-2b - Architecture: Decoder-only transformer LLM
- Tokenizer: Gemma tokenizer (sentencepiece)
- Training Type: QLoRA (SFT) + DPO
- Language: English
- Usage: General-purpose text generation & instruction following
📌 2. Datasets Used
🟦 A. SFT Dataset (Supervised Fine-Tuning)
Merged into:
train_sft_50k.jsonl
Includes:
tatsu-lab/alpaca(~52k)databricks/dolly-15k- Additional filtered samples:
- lamini_20k
- ign_20k
- ultrachat_20k
(mostly skipped due to filtering)
SFT Prompt Format
<user>
{instruction}
</user>
<assistant>
{response}
🟩 B. DPO Dataset (Preference Alignment)
Merged from:
- Anthropic HH-RLHF
- Stanford SHP
- UltraFeedback
- JudgeLM
Used in chosen-vs-rejected pair format.
📌 3. Training Details
🟦 SFT (Supervised Fine-Tuning)
QLoRA Configuration:
- Rank: 8
- Alpha: 16
- Dropout: 0.05
- Precision: bfloat16
- Epochs: 1
- LR: 2e-4
- Gradient Accumulation: 20
- Target Modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
🟩 DPO (Direct Preference Optimization)
- Beta: 0.1
- Learning rate: 5e-5
- Grad Accumulation: 8
- Policy model = SFT-trained adapter
📌 4. Inference Instructions
Below is the exact format required to prompt the model, matching the training:
Prompt Template
<user>
{your_message}
</user>
<assistant>
🟦 FastAPI Streaming Server (server.py)
This model was tested using a custom FastAPI server with:
- Local model loading (no HF auto-download)
- SFT-exact prompt builder
- Tag suppression to prevent invalid XML-like output
- Greedy decoding:
do_sample=Falserepetition_penalty=1.3no_repeat_ngram_size=4
Example: Python Local Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_dir = "Nexura-gemma2b-sft-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")
prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
repetition_penalty=1.3,
no_repeat_ngram_size=4
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
🟩 Curl API Example
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
📌 5. Intended Use
✔ Recommended Uses
- Chat assistants
- Instruction following
- Educational Q/A
- Coding help
- Summaries
- Reasoning tasks
- Content rewriting
❌ Not Recommended
- Medical, legal, or financial advice
- Real-world decision making
- High-risk or safety-critical systems
- Generating harmful, biased, or toxic content
📌 6. Strengths
- Lightweight (2B parameters)
- Fast inference on consumer GPUs
- Clean behavior after SFT formatting correction
- Strong alignment after DPO training
- Stable responses due to greedy decoding
📌 7. Limitations
- Limited knowledge compared to larger LLMs
- May hallucinate if prompt format is not followed
- Not multilingual
- No factual updates after 2023 (Gemma limitation)
📌 8. Hardware Requirements
- GPU Recommended: 8GB+ VRAM
- Minimum CPU RAM: 6GB
- Quantized 4-bit mode: Runs on mid-range systems
- Ideal: NVIDIA RTX 3060 / 4060+
📌 9. License
This model inherits the Gemma License, which allows:
- Research use
- Commercial use under conditions
- Attribution to Google
Full license details:
https://ai.google.dev/gemma/terms
📌 10. Citation
If you use this model:
@misc{nexura_gemma2b_2025,
title={Nexura-Gemma-2B},
model={Custom fine-tuned Gemma-2B},
author={Arun Vpp},
year={2025},
publisher={Hugging Face}
}
🎯 Final Notes
This README is fully compatible with Hugging Face’s metadata requirements.
Just paste it into your README.md — no modification needed.
- Downloads last month
- 2