WARNING: ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should refer to the Limitations section and apply additional filtering and alignment processes before deploying this model in production.
ALIA-40b-instruct - GGUF
- Model creator: BSC-LT
- Original model: ALIA-40b-instruct
Description
This repo contains GGUF format model files for BSC-LT/ALIA-40b-instruct.
About GGUF
GGUF is the model file format introduced by the llama.cpp team on August 21st, 2023, replacing the older GGML format (now deprecated). It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.
Model Conversion
This model was converted from its original Hugging Face format to GGUF using the official tools provided in llama.cpp.
The conversion process embeds all necessary tokenizer and configuration data directly into the .gguf file for full portability.
The base model was exported in BF16 precision and then quantized for faster inference and smaller file size.
Hereβs your section rewritten for clarity, conciseness, and clean formatting β it keeps your structure but improves readability, adds consistent comments, and fixes a few small syntax issues:
Commands Used
Below are the steps and commands used to convert and quantize the model to the GGUF format using llama.cpp.
# Go to the llama.cpp directory
cd llama.cpp
# (Optional) Create a Python virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies required for conversion
pip install -r requirements.txt
# Convert Hugging Face model to GGUF (BF16 precision)
python3 convert_hf_to_gguf.py /path/to/hf_model \
--outfile /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
--outtype bf16
π οΈ Skip the next section if you already have a build.
# Create and enter a build directory
mkdir build && cd build
# Configure and compile with CUDA support (optional)
cmake .. -DGGML_CUDA=ON -DGGML_NATIVE=OFF \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_BUILD_TYPE=Release
# Build with parallel jobs (adjust -j as needed)
cmake --build . --config Release --verbose -j 12
# Quantize the GGUF model
# Run this from the llama.cpp directory
QU=Q8_0 # Change to Q4_K_M, Q5_K_S, etc. as needed
./build/bin/llama-quantize \
/gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
/gpfs/path/to/output/ALIA-40b-instruct_bos_${QU}.gguf \
${QU}
For detailed installation steps, build options, and quantization types, see the llama.cpp GitHub repository.
Would you like me to make this even shorter β e.g., a βQuick Command Summaryβ version suitable for inclusion inside a Hugging Face model card (README.md)?
Prompt template:
{{- bos_token }}{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content'] %}{%- set loop_messages = messages[1:] %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{%- else %}{%- set loop_messages = messages %}{%- endif %}{% for message in loop_messages %}{%- if (message['role'] != 'user') and (message['role'] != 'assistant')%}{{ raise_exception('Only user and assistant roles are suported after the initial optional system message.') }}{% endif %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
- Downloads last month
- 74
8-bit
