Instructions to use arunvpp05/Nexura-Gemma2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arunvpp05/Nexura-Gemma2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="arunvpp05/Nexura-Gemma2B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arunvpp05/Nexura-Gemma2B")
model = AutoModelForCausalLM.from_pretrained("arunvpp05/Nexura-Gemma2B")

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use arunvpp05/Nexura-Gemma2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "arunvpp05/Nexura-Gemma2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arunvpp05/Nexura-Gemma2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/arunvpp05/Nexura-Gemma2B

SGLang

How to use arunvpp05/Nexura-Gemma2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "arunvpp05/Nexura-Gemma2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arunvpp05/Nexura-Gemma2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "arunvpp05/Nexura-Gemma2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arunvpp05/Nexura-Gemma2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use arunvpp05/Nexura-Gemma2B with Docker Model Runner:
```
docker model run hf.co/arunvpp05/Nexura-Gemma2B
```

🔷 Nexura-Gemma-2B

A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model

Nexura-Gemma-2B is a custom fine-tuned variant of Google’s Gemma-2B model.
It is trained in two stages:

SFT (Supervised Fine-Tuning) using high-quality instruction datasets
DPO (Direct Preference Optimization) for preference alignment

The model follows a strict XML-style instruction format, exactly matching the SFT training data:

<user>
{instruction}
</user>

<assistant>
{response}

📌 1. Base Model

Base: google/gemma-2b
Architecture: Decoder-only transformer LLM
Tokenizer: Gemma tokenizer (sentencepiece)
Training Type: QLoRA (SFT) + DPO
Language: English
Usage: General-purpose text generation & instruction following

📌 2. Datasets Used

🟦 A. SFT Dataset (Supervised Fine-Tuning)

Merged into:

train_sft_50k.jsonl

Includes:

tatsu-lab/alpaca (~52k)
databricks/dolly-15k
Additional filtered samples:
- lamini_20k
- ign_20k
- ultrachat_20k
  (mostly skipped due to filtering)

SFT Prompt Format

<user>
{instruction}
</user>

<assistant>
{response}

🟩 B. DPO Dataset (Preference Alignment)

Merged from:

Anthropic HH-RLHF
Stanford SHP
UltraFeedback
JudgeLM

Used in chosen-vs-rejected pair format.

📌 3. Training Details

🟦 SFT (Supervised Fine-Tuning)

QLoRA Configuration:

Rank: 8
Alpha: 16
Dropout: 0.05
Precision: bfloat16
Epochs: 1
LR: 2e-4
Gradient Accumulation: 20
Target Modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj

🟩 DPO (Direct Preference Optimization)

Beta: 0.1
Learning rate: 5e-5
Grad Accumulation: 8
Policy model = SFT-trained adapter

📌 4. Inference Instructions

Below is the exact format required to prompt the model, matching the training:

Prompt Template

<user>
{your_message}
</user>

<assistant>

🟦 FastAPI Streaming Server (`server.py`)

This model was tested using a custom FastAPI server with:

Local model loading (no HF auto-download)
SFT-exact prompt builder
Tag suppression to prevent invalid XML-like output
Greedy decoding:
- do_sample=False
- repetition_penalty=1.3
- no_repeat_ngram_size=4

Example: Python Local Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_dir = "Nexura-gemma2b-sft-dpo"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")

prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    repetition_penalty=1.3,
    no_repeat_ngram_size=4
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

🟩 Curl API Example

curl -X POST http://localhost:8000/api/chat \
     -H "Content-Type: application/json" \
     -d '{"messages":[{"role":"user","content":"hi"}]}'

📌 5. Intended Use

✔ Recommended Uses

Chat assistants
Instruction following
Educational Q/A
Coding help
Summaries
Reasoning tasks
Content rewriting

❌ Not Recommended

Medical, legal, or financial advice
Real-world decision making
High-risk or safety-critical systems
Generating harmful, biased, or toxic content

📌 6. Strengths

Lightweight (2B parameters)
Fast inference on consumer GPUs
Clean behavior after SFT formatting correction
Strong alignment after DPO training
Stable responses due to greedy decoding

📌 7. Limitations

Limited knowledge compared to larger LLMs
May hallucinate if prompt format is not followed
Not multilingual
No factual updates after 2023 (Gemma limitation)

📌 8. Hardware Requirements

GPU Recommended: 8GB+ VRAM
Minimum CPU RAM: 6GB
Quantized 4-bit mode: Runs on mid-range systems
Ideal: NVIDIA RTX 3060 / 4060+

📌 9. License

This model inherits the Gemma License, which allows:

Research use
Commercial use under conditions
Attribution to Google

Full license details:
https://ai.google.dev/gemma/terms

📌 10. Citation

If you use this model:

@misc{nexura_gemma2b_2025,
  title={Nexura-Gemma-2B},
  model={Custom fine-tuned Gemma-2B},
  author={Arun Vpp},
  year={2025},
  publisher={Hugging Face}
}

🎯 Final Notes

This README is fully compatible with Hugging Face’s metadata requirements.
Just paste it into your README.md — no modification needed.

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for arunvpp05/Nexura-Gemma2B

Base model

google/gemma-2b

Adapter

(23700)

this model

Adapters

2 models

arunvpp05
/

Nexura-Gemma2B

🔷 Nexura-Gemma-2B

A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model

📌 1. Base Model

📌 2. Datasets Used

🟦 A. SFT Dataset (Supervised Fine-Tuning)

SFT Prompt Format

🟩 B. DPO Dataset (Preference Alignment)

📌 3. Training Details

🟦 SFT (Supervised Fine-Tuning)

🟩 DPO (Direct Preference Optimization)

📌 4. Inference Instructions

Prompt Template

🟦 FastAPI Streaming Server (`server.py`)

Example: Python Local Inference

🟩 Curl API Example

📌 5. Intended Use

✔ Recommended Uses

❌ Not Recommended

📌 6. Strengths

📌 7. Limitations

📌 8. Hardware Requirements

📌 9. License

📌 10. Citation

🎯 Final Notes

Model tree for arunvpp05/Nexura-Gemma2B

Datasets used to train arunvpp05/Nexura-Gemma2B

🔷 Nexura-Gemma-2B

A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model

📌 1. Base Model

📌 2. Datasets Used

🟦 A. SFT Dataset (Supervised Fine-Tuning)

SFT Prompt Format

🟩 B. DPO Dataset (Preference Alignment)

📌 3. Training Details

🟦 SFT (Supervised Fine-Tuning)

🟩 DPO (Direct Preference Optimization)

📌 4. Inference Instructions

Prompt Template

🟦 FastAPI Streaming Server (server.py)

Example: Python Local Inference

🟩 Curl API Example

📌 5. Intended Use

✔ Recommended Uses

❌ Not Recommended

📌 6. Strengths

📌 7. Limitations

📌 8. Hardware Requirements

📌 9. License

📌 10. Citation

🎯 Final Notes

Model tree for arunvpp05/Nexura-Gemma2B

Datasets used to train arunvpp05/Nexura-Gemma2B

🟦 FastAPI Streaming Server (`server.py`)