Instructions to use BSC-LT/salamandra-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BSC-LT/salamandra-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BSC-LT/salamandra-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BSC-LT/salamandra-7b") model = AutoModelForCausalLM.from_pretrained("BSC-LT/salamandra-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BSC-LT/salamandra-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BSC-LT/salamandra-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/salamandra-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BSC-LT/salamandra-7b
- SGLang
How to use BSC-LT/salamandra-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BSC-LT/salamandra-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/salamandra-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BSC-LT/salamandra-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/salamandra-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BSC-LT/salamandra-7b with Docker Model Runner:
docker model run hf.co/BSC-LT/salamandra-7b
Non-instruct models are missing the tokenizer.json files
Hi!
I am working to add support for Salamandra in Llama.cpp. For some reason, only the instruct models have a tokenizer.json file. Both non-instruct models lack it, so when trying to perform gguf conversions it fails.
Would you be so kind to upload those files to salamandra-7b and salamandra-2b?
Thanks!
Hi!
Since this is a SentencePiece model, and we weren’t aware that the GGUF conversion requires a different format, we didn’t upload the tokenizer.json initially. Apologies for that! We’ve now uploaded the tokenizer.json.
If you encounter any other issues, please feel free to let us know.
Thanks for the heads-up!
Hi Joan,
I wasn't aware either but when using the convert_hf_to_gguf_update.py it tries to download all the files from the repo, and fails if there is no tokenizer.json present : https://github.com/ggerganov/llama.cpp/blob/c81f3bbb051f8b736e117dfc78c99d7c4e0450f6/convert_hf_to_gguf_update.py#L117
In theory, when you save the tokenizer it should produce all files but in the github repo there is only the vocab.json file.
Best,
César
I believe I downloaded the tokenizer manually when I saw it was missing, but still got the error I reported
btw, (I realize this is now closed and Caesar has his quant up already), that link about convert_hf_to_gguf.py lacking the slow tokenizer path is no longer valid. I just submitted a change to that file in a PR to llama.cpp for LLamaForCasualLM models to ensure it captures all added_tokens (which may or may not actually be wanted but in v3927 specifically there were warnings, which I found interesting), so I'm familiar with the code currently. I can attest that the function has changed to support the "slow" method :)