Instructions to use prithivMLmods/Nenque-MoT-0.6B-Elite14 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Nenque-MoT-0.6B-Elite14 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Nenque-MoT-0.6B-Elite14")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Nenque-MoT-0.6B-Elite14")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Nenque-MoT-0.6B-Elite14")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Nenque-MoT-0.6B-Elite14 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Nenque-MoT-0.6B-Elite14"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Nenque-MoT-0.6B-Elite14",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Nenque-MoT-0.6B-Elite14

SGLang

How to use prithivMLmods/Nenque-MoT-0.6B-Elite14 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Nenque-MoT-0.6B-Elite14" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Nenque-MoT-0.6B-Elite14",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Nenque-MoT-0.6B-Elite14" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Nenque-MoT-0.6B-Elite14",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Nenque-MoT-0.6B-Elite14 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Nenque-MoT-0.6B-Elite14
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Nenque-MoT-0.6B-Elite14

Nenque-MoT-0.6B-Elite14 is a compact, high-efficiency model tailored for mathematical reasoning, code generation, and structured technical inference. Fine-tuned from Qwen3-0.6B using the MoT (Mixture of Thoughts) dataset—with a focus on math expert clusters—this model delivers strong symbolic performance in low-resource environments. Despite its 0.6B parameter size, it offers elite-level precision across STEM and multilingual technical domains.

GGUF: https://huggingface.co/prithivMLmods/Nenque-MoT-0.6B-Elite14-GGUF

Key Features

MoT Fine-Tuning on Math Expert Clusters Trained on a curated Mixture of Thoughts (MoT) dataset emphasizing symbolic mathematics, code reasoning, and problem-solving, enhancing precision in structured tasks.
Elite Mathematical Reasoning Excels in solving algebraic equations, calculus, and symbolic logic step-by-step—suitable for education, competitions, and STEM support tools.
Compact Code Assistant Generates concise, explainable code in Python, JavaScript, and others—ideal for code tutoring, bug diagnosis, and fast prototyping.
Structured Output Generation Supports generation in Markdown, JSON, LaTeX, and tabular formats, making it a valuable tool for documentation and technical data generation.
Multilingual Technical Mastery Delivers consistent results across 20+ languages for math and code—serving global academic and development use cases.
Lightweight Inference-Ready Design Optimized for edge devices, GPUs with limited VRAM, and offline deployments, enabling high-quality results on constrained systems.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Nenque-MoT-0.6B-Elite14"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve the equation: 2(x - 4) + 3 = 11. Show all steps."

messages = [
    {"role": "system", "content": "You are a step-by-step math tutor."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Step-by-step mathematical reasoning and symbolic computation
Lightweight multilingual code generation and debugging
Structured content generation (e.g., LaTeX, JSON, Markdown)
Academic tutoring and technical assistant roles
Deployment in resource-constrained or edge scenarios

Limitations

Not suitable for extended creative generation or conversational fluency
Limited context length impacts performance on long multi-step tasks
Fine-tuned on technical domains—general chat or abstract logic tasks may underperform
Specialized for structured outputs—free-form generation is not its focus