mradermacher/model_requests · ducklingcodehouse/Finnish-DentalQA-v2-merged

The special tokens [INST], [/INST], <<SYS>>, and <</SYS>> are marked as NORMAL tokens instead of CONTROL tokens in the GGUF metadata. This causes them to appear as visible text rather than being consumed as control tokens during generation.

This is a known issue with the Ahma-3B-Instruct base model - the convert_hf_to_gguf.py script fails to detect these tokens as special because tokenizer.all_special_tokens doesn't include them, even though they're defined in added_tokens_decoder.

MODELS TO FIX

ducklingcodehouse/Finnish-DentalQA-v2-merged
ducklingcodehouse/Finnish-DentalQA-merged

VALIDATION TEST
Input: Mikä on paras hoito akuutissa pulpiitissa?

Expected correct output:
### Tausta
Akuutti pulpiitti on kivulias tila, jossa hammasytimen tulehdus vaatii välitöntä hoitoa...

Current broken output:
[/INST] ### Tausta... (visible control tokens)

TECHNICAL SOLUTION
The following token IDs need to be changed from NORMAL to CONTROL type in the GGUF metadata:

Token ID 3: [INST]
Token ID 4: [/INST]
Token ID 5: <<SYS>>
Token ID 6: <</SYS>>

This can be done either:

During conversion using updated llama.cpp tools that properly detect these tokens
Post-conversion using a GGUF editor to manually change token types
Using a custom conversion script that explicitly marks these tokens as CONTROL

While this token type fix should resolve the primary issue, there may be additional template-related problems that surface once the tokenization is corrected, but this is definitely the first critical step that needs addressing.

REQUEST
Could you re-convert both models ensuring these four tokens are properly marked as CONTROL tokens? This should completely fix the visible [/INST] issue and make the models work correctly.

Thanks for your expertise!

Best regards,
Heikki

nicoboss

Sep 5, 2025

•

edited Sep 5, 2025

I already requested @mradermacher to update to latest llama.cpp for some other reasons so we can requeue it once he has updated. I assume on latest llama.cpp this should convert without this issue? Our current version is around 1 week old. If there is something special, I have to do during HF to GGUF conversion please give me instructions like what command line arguments to use. Editing the GGUF after convearsion is not realy an option for us.

ducklingcodehouse

Sep 5, 2025

•

edited Sep 5, 2025

Hi! Thanks for the quick response and willingness to help.

The issue specifically affects models based on Ahma-3B-Instruct where special tokens aren't always detected properly.

Try adding this parameter during conversion:

--special-tokens '[INST],[/INST],<<SYS>>,</SYS>>'

The key issue is that token IDs 3, 4, 5, and 6 need to be marked as CONTROL tokens instead of NORMAL tokens.

Thanks again for your help!

Best, Heikki

ducklingcodehouse

Sep 6, 2025

And I also meant to say that I doubt merely updating llama will suffice. It might but it would be better to make sure by using the approach above.

Best, Heikki

mradermacher

Owner Sep 7, 2025

I think this should be better fixed on the original model side of things, since this will likely affect other programs too. Not stopping nicoboss to supply source ggufs, of course, his effort are always appreciated :)

ducklingcodehouse

Sep 7, 2025

Thanks for the reply. The underlying problem seems to be that llama looks for these special tokens in the wrong place. They're defined in the tokenizer config, which is OK practice, but llama doesn't pick up them from there and would need some manual help for this.

Best, Heikki

mradermacher

Owner Sep 7, 2025

•

edited Sep 7, 2025

In that case, I retract what I said and claim the opposite, namely this should be fixed in llama.cpp, and nicoboss's heroic efforts are still appreciated :) If transformers work correctly with the model, it's a llama.cpp bug.

ducklingcodehouse

Sep 7, 2025

•

edited Sep 7, 2025

Yes, indeed, Transformers works just ok. This problem with llama was already noted in the community discussion for Ahma 3b Instruct (way back ago), which is the base for these models.

ducklingcodehouse

Sep 7, 2025

Ping @nicoboss

nicoboss

Sep 7, 2025

I will manually provide the GGUFs in this case. I assume the only change is specifying --special-tokens '[INST],[/INST],<<SYS>>,</SYS>>' when calling convert_hf_to_gguf.py

ducklingcodehouse

Sep 7, 2025

Thank you! I think this should work but I'm only relying on the community discussion where someone found this working, so I'm not 100% sure. If possible, I'd advice to test that the conversation works. But at least the models are fine as they work well with Transformers.

Pls also note that there are two models where you earlier did the conversion, the original ducklingcodehouse/Finnish-DentalQA-merged and the improved ducklingcodehouse/Finnish-DentalQA-v2-merged

Thank you again! 🙏

Best, Heikki

ducklingcodehouse

Sep 7, 2025

FYI @nicoboss the original Ahma conversion discussion is here: https://huggingface.co/mradermacher/Ahma-3B-Instruct-GGUF/discussions/1

ducklingcodehouse

Sep 7, 2025

•

edited Sep 7, 2025

Just wanted to mention that if the --special-tokens parameter doesn't work as expected during conversion, there's also a post-conversion solution that was proven to work in the original Ahma discussion.

You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:

Load the converted GGUF file
Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)
Change their type from NORMAL to CONTROL
Download the corrected file

This was the solution that @osma verified actually works in the Ahma discussion thread. So if we run into any issues with the conversion parameter approach, this could be a reliable fallback.

Thanks again for all your help!

Best, Heikki

nicoboss

Sep 7, 2025

•

edited Sep 7, 2025

convert_hf_to_gguf.py unfortinately has no argument --special-tokens:

root@AI:/apool/llama.cpp# venv/bin/python convert_hf_to_gguf.py --special-tokens '[INST],[/INST],<<SYS>>,</SYS>>' --outtype=bf16 --outfile=/mradermacher/tmp/quant/Finnish-DentalQA-merged.gguf /transfer/Finnish-DentalQA-merged/
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}] [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose] [--split-max-tensors SPLIT_MAX_TENSORS]
                             [--split-max-size SPLIT_MAX_SIZE] [--dry-run] [--no-tensor-first-split] [--metadata METADATA] [--print-supported-models] [--remote] [--mmproj] [--mistral-format] [--disable-mistral-community-chat-template]
                             [model]
convert_hf_to_gguf.py: error: unrecognized arguments: --special-tokens /transfer/Finnish-DentalQA-merged/

nicoboss

Sep 7, 2025

I guess I really have to use gguf-editor which is quite some afford but you seem to be so passionate about this model that is something I'm willing to do.

ducklingcodehouse

Sep 7, 2025

Oh, sorry to hear! Well I wasn't really sure here. But many thanks for this! Cheers! 🙏💪🔥

Best, Heikki

ducklingcodehouse

Sep 8, 2025

Pls ping me @nicoboss when you have the new versions. Hope it all goes well. 🙂

nicoboss

Sep 8, 2025

You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:

Load the converted GGUF file

Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)

Change their type from NORMAL to CONTROL

Download the corrected file

I just followed your guide. I hope I correctly changed everything as desired:

ducklingcodehouse

Sep 9, 2025

You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:

Load the converted GGUF file

Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)

Change their type from NORMAL to CONTROL

Download the corrected file

I just followed your guide. I hope I correctly changed everything as desired:

It seems promising, but did you try it out? Can I try it out? Didn't see the GGUF models anymore on HuggingFace.

ducklingcodehouse

Sep 9, 2025

•

edited Sep 9, 2025

I just tried the new model out (assuming you uploaded the new one). I think it still doesn't behave very well. At this point, maybe I'd just recommend dropping the GGUF models for this ducklingcodehouse/Finnish-DentalQA-v2-merged and the earlier v1 (ducklingcodehouse/Finnish-DentalQA-merged) as I think it's perhaps gotten a bit too hard and would need maybe more investigation. I was just hoping that based on the earlier discussion with Ahma 3b Instruct there might have been a fix here. Not being a GGUF conversion expert myself I just based my advice on that discussion but perhaps it's not sufficient. Anyway, I thank you warmly for your effort.

Best, Heikki

nicoboss

Sep 9, 2025

@ducklingcodehouse Please redownload our latest quants and try again. I accidentally queued the model to marco instead of nico1 which caused the system to regenerate the source GGUF instead of taking the manually edited source GGUF.

Download page: https://hf.tst.eu/model#Finnish-DentalQA-v2-merged-GGUF
Static quants: https://huggingface.co/mradermacher/Finnish-DentalQA-v2-merged-GGUF
Weighted/imatrix quants: https://huggingface.co/mradermacher/Finnish-DentalQA-v2-merged-i1-GGUF

ducklingcodehouse

Sep 9, 2025

Thanks but still it behaves very weirdly. I think it has inherited this from the underlying Ahma model, which already had problems with GGUF. I've tried this now multiple different ways but really no luck. Sometimes it's better, sometimes worse, but always at least a bit off.

I'd recommend we just content ourselves with the Transformers versions of these Finnish DentalQA models. Thanks for your effort anyway!

Best, Heikki