Non-instruct models are missing the tokenizer.json files

by elsatch - opened Oct 10, 2024

Oct 10, 2024

Hi!

I am working to add support for Salamandra in Llama.cpp. For some reason, only the instruct models have a tokenizer.json file. Both non-instruct models lack it, so when trying to perform gguf conversions it fails.

Would you be so kind to upload those files to salamandra-7b and salamandra-2b?

Thanks!

joanllop

Oct 10, 2024

Hi!

Since this is a SentencePiece model, and we weren’t aware that the GGUF conversion requires a different format, we didn’t upload the tokenizer.json initially. Apologies for that! We’ve now uploaded the tokenizer.json.
If you encounter any other issues, please feel free to let us know.

Thanks for the heads-up!

joanllop changed discussion status to closed Oct 10, 2024

elsatch

Oct 10, 2024

Hi Joan,

I wasn't aware either but when using the convert_hf_to_gguf_update.py it tries to download all the files from the repo, and fails if there is no tokenizer.json present : https://github.com/ggerganov/llama.cpp/blob/c81f3bbb051f8b736e117dfc78c99d7c4e0450f6/convert_hf_to_gguf_update.py#L117

In theory, when you save the tokenizer it should produce all files but in the github repo there is only the vocab.json file.

Best,
César

robbiemu

Oct 10, 2024

I believe I downloaded the tokenizer manually when I saw it was missing, but still got the error I reported

robbiemu

Oct 18, 2024

•

edited Oct 18, 2024

btw, (I realize this is now closed and Caesar has his quant up already), that link about convert_hf_to_gguf.py lacking the slow tokenizer path is no longer valid. I just submitted a change to that file in a PR to llama.cpp for LLamaForCasualLM models to ensure it captures all added_tokens (which may or may not actually be wanted but in v3927 specifically there were warnings, which I found interesting), so I'm familiar with the code currently. I can attest that the function has changed to support the "slow" method :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment