You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

WARNING: ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should refer to the Limitations section and apply additional filtering and alignment processes before deploying this model in production.

ALIA-40b-instruct - GGUF

Description

This repo contains GGUF format model files for BSC-LT/ALIA-40b-instruct.

About GGUF

GGUF is the model file format introduced by the llama.cpp team on August 21st, 2023, replacing the older GGML format (now deprecated). It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.

Model Conversion

This model was converted from its original Hugging Face format to GGUF using the official tools provided in llama.cpp. The conversion process embeds all necessary tokenizer and configuration data directly into the .gguf file for full portability.

The base model was exported in BF16 precision and then quantized for faster inference and smaller file size.

Here’s your section rewritten for clarity, conciseness, and clean formatting β€” it keeps your structure but improves readability, adds consistent comments, and fixes a few small syntax issues:

Commands Used

Below are the steps and commands used to convert and quantize the model to the GGUF format using llama.cpp.

# Go to the llama.cpp directory
cd llama.cpp

# (Optional) Create a Python virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies required for conversion
pip install -r requirements.txt
# Convert Hugging Face model to GGUF (BF16 precision)
python3 convert_hf_to_gguf.py /path/to/hf_model \
  --outfile /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
  --outtype bf16

πŸ› οΈ Skip the next section if you already have a build.

# Create and enter a build directory
mkdir build && cd build

# Configure and compile with CUDA support (optional)
cmake .. -DGGML_CUDA=ON -DGGML_NATIVE=OFF \
         -DCMAKE_VERBOSE_MAKEFILE=ON \
         -DCMAKE_BUILD_TYPE=Release

# Build with parallel jobs (adjust -j as needed)
cmake --build . --config Release --verbose -j 12
# Quantize the GGUF model
# Run this from the llama.cpp directory

QU=Q8_0  # Change to Q4_K_M, Q5_K_S, etc. as needed

./build/bin/llama-quantize \
  /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
  /gpfs/path/to/output/ALIA-40b-instruct_bos_${QU}.gguf \
  ${QU}

For detailed installation steps, build options, and quantization types, see the llama.cpp GitHub repository.


Would you like me to make this even shorter β€” e.g., a β€œQuick Command Summary” version suitable for inclusion inside a Hugging Face model card (README.md)?

Prompt template:

{{- bos_token }}{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content'] %}{%- set loop_messages = messages[1:] %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{%- else %}{%- set loop_messages = messages %}{%- endif %}{% for message in loop_messages %}{%- if (message['role'] != 'user') and (message['role'] != 'assistant')%}{{ raise_exception('Only user and assistant roles are suported after the initial optional system message.') }}{% endif %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
Downloads last month
74
GGUF
Model size
40B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BSC-LT/ALIA-40b-instruct_Q8_0

Base model

BSC-LT/ALIA-40b
Quantized
(1)
this model

Datasets used to train BSC-LT/ALIA-40b-instruct_Q8_0