---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
language:
  - ca
  - en
  - es
  - eu
  - gl
datasets:
  - CohereLabs/aya_dataset
  - projecte-aina/CoQCat
  - databricks/databricks-dolly-15k
  - projecte-aina/dolly3k_ca
  - projecte-aina/MentorES
  - projecte-aina/MentorCA
  - HuggingFaceH4/no_robots
  - projecte-aina/RAG_Multilingual
  - Unbabel/TowerBlocks-v0.2
  - OpenAssistant/oasst2
  - open-r1/OpenR1-Math-220k
  - HuggingFaceFW/fineweb-edu
base_model:
  - BSC-LT/ALIA-40b-instruct
model_name: ALIA-40b-instruct

quantized_by: BSC-LT

---

<!-- header start -->
![](./images/logo_alia_2.png)

> [!WARNING]
> **WARNING:** ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should **refer to the Limitations section** and apply additional filtering and alignment processes before deploying this model in production.

<!-- header end -->

# ALIA-40b-instruct - GGUF
- Model creator: [BSC-LT](https://huggingface.co/BSC-LT)
- Original model: [ALIA-40b-instruct](https://huggingface.co/BSC-LT/ALIA-40b-instruct)

## Description

This repo contains GGUF format model files for [BSC-LT/ALIA-40b-instruct](https://huggingface.co/BSC-LT/ALIA-40b-instruct).


### About GGUF

**GGUF** is the model file format introduced by the **llama.cpp** team on **August 21st, 2023**, replacing the older GGML format (now deprecated).
It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.


### Model Conversion

This model was converted from its original **Hugging Face** format to **GGUF** using the official tools provided in [**llama.cpp**](https://github.com/ggerganov/llama.cpp).
The conversion process embeds all necessary tokenizer and configuration data directly into the `.gguf` file for full portability.

The base model was exported in **BF16 precision** and then quantized for faster inference and smaller file size.

Here’s your section rewritten for clarity, conciseness, and clean formatting — it keeps your structure but improves readability, adds consistent comments, and fixes a few small syntax issues:

#### Commands Used

Below are the steps and commands used to convert and quantize the model to the **GGUF** format using [**llama.cpp**](https://github.com/ggerganov/llama.cpp).


```bash
# Go to the llama.cpp directory
cd llama.cpp

# (Optional) Create a Python virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies required for conversion
pip install -r requirements.txt
```

```bash
# Convert Hugging Face model to GGUF (BF16 precision)
python3 convert_hf_to_gguf.py /path/to/hf_model \
  --outfile /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
  --outtype bf16
```

> 🛠️ **Skip the next section if you already have a build.**

```bash
# Create and enter a build directory
mkdir build && cd build

# Configure and compile with CUDA support (optional)
cmake .. -DGGML_CUDA=ON -DGGML_NATIVE=OFF \
         -DCMAKE_VERBOSE_MAKEFILE=ON \
         -DCMAKE_BUILD_TYPE=Release

# Build with parallel jobs (adjust -j as needed)
cmake --build . --config Release --verbose -j 12
```

```bash
# Quantize the GGUF model
# Run this from the llama.cpp directory

QU=Q8_0  # Change to Q4_K_M, Q5_K_S, etc. as needed

./build/bin/llama-quantize \
  /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
  /gpfs/path/to/output/ALIA-40b-instruct_bos_${QU}.gguf \
  ${QU}
```

For detailed installation steps, build options, and quantization types, see the [**llama.cpp GitHub repository**](https://github.com/ggerganov/llama.cpp).

---

Would you like me to make this even shorter — e.g., a “Quick Command Summary” version suitable for inclusion inside a Hugging Face model card (`README.md`)?


## Prompt template:

```
{{- bos_token }}{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content'] %}{%- set loop_messages = messages[1:] %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{%- else %}{%- set loop_messages = messages %}{%- endif %}{% for message in loop_messages %}{%- if (message['role'] != 'user') and (message['role'] != 'assistant')%}{{ raise_exception('Only user and assistant roles are suported after the initial optional system message.') }}{% endif %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
```