Instructions to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF", dtype="auto")

llama-cpp-python

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF",
	filename="Falcon-H1-Tiny-90M-Instruct-BF16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Ollama:
```
ollama run hf.co/tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
```

Unsloth Studio

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF to start chatting

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Falcon-H1-Tiny-90M-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

TL;DR
Model Details
Training Details
Usage
Evaluation
Citation

TL;DR

Model Details

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Hybrid Transformers + Mamba architecture
Language(s) (NLP): English
Number of Parameters: 90M
License: Falcon-LLM License

Training details

For more details about the training protocol of this model, please refer to the Falcon-H1-Tiny technical blogpost.

Usage

Currently to use this model you can either rely on Hugging Face transformers, vLLM, sglang, llama.cpp, ollama or mlx library.

Inference

🤗 transformers

Refer to the snippet below to run H1 models using 🤗 transformers:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon-H1-Tiny-90M-Instruct-pre-DPO"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
  device_map="auto"
)

# Perform text generation

transformers serve tiiuae/Falcon-H1-Tiny-90M-Instruct-pre-DPO

`llama.cpp`

You can find all GGUF files compatible with llama.cpp under our official collection - an example setup could be:

brew install llama.cpp 
pip install huggingface_hub 
hf download tiiuae/Falcon-H1-Tiny-90M-Instruct-pre-DPO Falcon-H1-Tiny-90M-Instruct-pre-DPO-Q8_0.gguf --local-dir ./ 
llama-cli ./ Falcon-H1-Tiny-90M-Instruct-pre-DPO-Q8_0.gguf -cnv

`ollama`

ollama run hf.co/tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF:Q8_0

Apple `mlx`

mlx_lm.chat --model tiiuae/Tiny-H1-SF

vLLM

For vLLM, simply start a server by executing the command below:

# pip install vllm>=0.9.0
vllm serve tiiuae/Falcon-H1-Tiny-90M-Instruct-pre-DPO --tensor-parallel-size 2 --data-parallel-size 1

sglang

python -m sglang.launch_server \
  --model ttiiuae/Falcon-H1-Tiny-90M-Instruct-pre-DPO \
  --tensor-parallel-size 1

Evaluation

For detailed evaluation of Tiny-H1 series, please refer to our technical blogpost

Useful links

View our release blogpost.
Feel free to join our discord server if you have any questions or to interact with our researchers and developers.

Citation

If the Falcon-H1-Tiny family of models were helpful to your work, feel free to give us a cite.

@misc{falcon_h1_tiny,
  title={Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale},
  author={Falcon-LLM Team},
  year={2026}, 
}

Downloads last month: 4,512

GGUF

Model size

91.1M params

Architecture

falcon-h1

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Collection including tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF

Falcon-H1-Tiny

Collection

A series of extremely small, yet powerful language models redefining capabilities at small scale • 19 items • Updated Mar 2 • 37

tiiuae
/

Falcon-H1-Tiny-90M-Instruct-GGUF

Table of Contents

TL;DR

Model Details

Model Description

Training details

Usage

Inference

🤗 transformers

`llama.cpp`

`ollama`

Apple `mlx`

vLLM

sglang

Evaluation

Useful links

Citation

Collection including tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF

Falcon-H1-Tiny

Table of Contents

TL;DR

Model Details

Model Description

Training details

Usage

Inference

🤗 transformers

llama.cpp

ollama

Apple mlx

vLLM

sglang

Evaluation

Useful links

Citation

Collection including tiiuae/Falcon-H1-Tiny-90M-Instruct-GGUF

`llama.cpp`

`ollama`

Apple `mlx`