Instructions to use Vaultkeeper/ouroboros-next with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Vaultkeeper/ouroboros-next with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Vaultkeeper/ouroboros-next") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Vaultkeeper/ouroboros-next") model = AutoModelForImageTextToText.from_pretrained("Vaultkeeper/ouroboros-next") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use Vaultkeeper/ouroboros-next with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Vaultkeeper/ouroboros-next", filename="Ouroboros-Next-9B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Vaultkeeper/ouroboros-next with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M
Use Docker
docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Vaultkeeper/ouroboros-next with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Vaultkeeper/ouroboros-next" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vaultkeeper/ouroboros-next", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
- SGLang
How to use Vaultkeeper/ouroboros-next with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Vaultkeeper/ouroboros-next" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vaultkeeper/ouroboros-next", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Vaultkeeper/ouroboros-next" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vaultkeeper/ouroboros-next", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Vaultkeeper/ouroboros-next with Ollama:
ollama run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
- Unsloth Studio new
How to use Vaultkeeper/ouroboros-next with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vaultkeeper/ouroboros-next to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vaultkeeper/ouroboros-next to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Vaultkeeper/ouroboros-next to start chatting
- Pi new
How to use Vaultkeeper/ouroboros-next with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Vaultkeeper/ouroboros-next:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Vaultkeeper/ouroboros-next with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Vaultkeeper/ouroboros-next:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Vaultkeeper/ouroboros-next with Docker Model Runner:
docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
- Lemonade
How to use Vaultkeeper/ouroboros-next with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Vaultkeeper/ouroboros-next:Q4_K_M
Run and chat with the model
lemonade run user.ouroboros-next-Q4_K_M
List all available models
lemonade list
Ouroboros-Next
by VaultAI
Deployment Status: ● ONLINE / RELEASED
[ VERSION 1.0 ] OUROBOROS-NEXT | NEURAL PIPELINE STABILIZED
✅ Intelligence, Unfiltered.
Most AI models give you the first, sanitized answer they can generate. They are built to agree, not to solve. Ouroboros-Next is built differently.
Engineered by VaultAI, Ouroboros-Next is a next-generation Linear Hybrid model. It synthesizes high-IQ "Heretic" reasoning with advanced multimodal vision capabilities. Designed for users who need expert-level execution without the corporate filler, it represents the evolution of the Ouroboros series into a fully multimodal coding agent. It doesn’t just answer your prompts; it interrogates them.
🧠 Architecture & Identity: The Shadow Triad
Ouroboros-Next is not a standard conversational assistant. It was engineered using a specialized 60/40 architectural split, designed specifically to process complex visual and textual information through a psychological framework.
Instead of defaulting to literal, surface-level descriptions, Ouroboros-Next evaluates prompts through a hardwired Jungian Shadow Triad logic system. When presented with an image or a scenario, the model is trained to look past the obvious and dissect the underlying psychological conflicts, hidden archetypes, and subconscious motivations at play.
Key Capabilities:
- Multimodal Psychoanalysis: Capable of ingesting complex visual scenes (via the
mmprojvision encoder) and outputting deep, qualitative analysis of the environment's emotional and psychological weight. - Subtextual Reasoning: Trained to bypass AI "pleasantries" and identify the inherent contradictions, shadow elements, and hidden meanings within text and code structures.
- Hardware Optimized: Fully compatible with
llama.cpp, allowing this complex reasoning to run efficiently on a single consumer-grade GPU (like an NVIDIA T4) using Q4_K_M quantization.
⚡ Performance & Benchmarks
Ouroboros-Next was benchmarked on a single NVIDIA T4 GPU (16GB VRAM) using the Q4_K_M quantization.
| Metric | Speed (Tokens / Second) | Hardware | Comparison Notes |
|---|---|---|---|
| Vision Encoding & Prompt Processing | 301.75 t/s | 1x T4 (16GB) | ~2.5x faster than base Llama-3-V on equivalent hardware. |
| Text Generation & Reasoning | 33.35 t/s | 1x T4 (16GB) | Matches GPT-4o-mini throughput while running locally. |
| Model Size / VRAM | 5.24 GB | 1x T4 (16GB) | Optimized for 12GB/16GB consumer cards with high context headroom. |
Technical Notes:
- Quantization:
Q4_K_M(GGUF) — The optimal balance of reasoning quality and speed. - Compatibility: Fully compatible with
llama.cppandOllama(requires the accompanyingmmprojfile). - Vision Projection: Prompt processing speed includes the
mmprojencoding overhead for high-resolution images.
Standardized Accuracy Benchmarks (Pending)
The following benchmarks are currently queued for evaluation to test the reasoning capabilities and knowledge retention of the architecture.
| Benchmark | Focus Area | Score | Status |
|---|---|---|---|
| GSM8k | Grade School Math | TBD | ⏳ Pending Eval |
| MMLU | General Knowledge | TBD | ⏳ Pending Eval |
| HumanEval | Coding & Logic | TBD | ⏳ Pending Eval |
| ARC-C | Advanced Reasoning | TBD | ⏳ Pending Eval |
Accuracy scores are actively being evaluated and will be updated soon.
Model Details
- Type: Multimodal Causal Language Model (Linear Hybrid)
- Base Architecture: Qwen 3.5 (9B) + Phi-4 (15B Vision)
- Total Parameters: ~12-14B (Effective density via Linear Blending)
- Context Length: 128,000 tokens (Optimized for deep dev tasks)
- Merge Method: Linear Weight Blending (60/40 Split)
- Weights Blend:
- 60% — Crow-9B-Opus-4.6-Distill-Heretic: Distilled Claude 4.6 Opus logic for sharp, unfiltered coding performance.
- 40% — Phi-4-reasoning-vision-15B: Microsoft’s state-of-the-art vision-reasoning backbone for GUI grounding and spatial logic.
- Tokenizer: crownelius/Crow-9B (Qwen 3.5 Base)
- License: Apache 2.0
Why Ouroboros-Next?
- Zero Corporate Fluff: No "As an AI..." apologies. Just confident, intelligence-first execution.
- Self-Auditing: The built-in Shadow and Vision protocols mean the model checks its own blind spots before you have to.
- Built for Builders: Designed for complex logic, agentic workflows, and deep technical problem-solving.
Key Custom Features
1. The Vision-Heretic Triad (Shadow Logic)
Before Ouroboros-Next outputs a single word, it initiates a mandatory internal debate. Inside every mandatory <think> block, the model divides its cognition into three distinct personas to stress-test its own logic:
- EGO (Builder): Primary high-performance code and architectural planning. Focuses on generating expert-level solutions instantly.
- SHADOW (Heretic): Aggressive auditor. Hunts down logical flaws, identifies "safe-mode" hallucinations, security flaws, and logic traps.
- VISION (Auditor): Grounded multimodal analysis. Enforces strict mathematical logic, maps UI coordinates
[x, y], and verifies visual evidence.
2. GUI & Multimodal Grounding
Optimized for Autonomous Computer Use. Ouroboros-Next can look at screenshots and provide precise, normalized coordinates for interactive elements, bridging the gap between "thinking" and "doing."
3. "Heretic" Reasoning
Unlike standard models, Ouroboros-Next inherits a distilled Claude 4.6 Opus personality—prioritizing efficient, direct, and un-sanitized technical solutions over corporate verbosity.
Intended Use
- Autonomous Coding Agents: Advanced repo-level analysis and auto-refactoring.
- Visual Web/GUI Navigation: Grounded multimodal reasoning for browser-based tasks.
- Deep Reasoning: Complex math and logic puzzles requiring cross-verified verification.
Ouroboros-Next
by VaultAI
- Downloads last month
- 710
4-bit