ducklingcodehouse/Finnish-DentalQA-v2-merged
Pretty sure I queued that already... Let me do some sleuthing...
Ah, right, I did queue it a short while before I replied, that's why I couldn't find a failure, it was a success :)
In short: should be linked formn your model page by now. Cheers!
Hi there!
Thank you for creating the GGUF quantizations for my Finnish-DentalQA models. I've been testing the conversions and found an issue that needs fixing.
The special tokens [INST], [/INST], <<SYS>>, and <</SYS>> are marked as NORMAL tokens instead of CONTROL tokens in the GGUF metadata. This causes them to appear as visible text rather than being consumed as control tokens during generation.
This is a known issue with the Ahma-3B-Instruct base model - the convert_hf_to_gguf.py script fails to detect these tokens as special because tokenizer.all_special_tokens doesn't include them, even though they're defined in added_tokens_decoder.
MODELS TO FIX
- ducklingcodehouse/Finnish-DentalQA-v2-merged
- ducklingcodehouse/Finnish-DentalQA-merged
VALIDATION TEST
Input: Mikä on paras hoito akuutissa pulpiitissa?
Expected correct output:
### Tausta
Akuutti pulpiitti on kivulias tila, jossa hammasytimen tulehdus vaatii välitöntä hoitoa...
Current broken output:
[/INST] ### Tausta... (visible control tokens)
TECHNICAL SOLUTION
The following token IDs need to be changed from NORMAL to CONTROL type in the GGUF metadata:
- Token ID 3: [INST]
- Token ID 4: [/INST]
- Token ID 5: <<SYS>>
- Token ID 6: <</SYS>>
This can be done either:
- During conversion using updated llama.cpp tools that properly detect these tokens
- Post-conversion using a GGUF editor to manually change token types
- Using a custom conversion script that explicitly marks these tokens as CONTROL
While this token type fix should resolve the primary issue, there may be additional template-related problems that surface once the tokenization is corrected, but this is definitely the first critical step that needs addressing.
REQUEST
Could you re-convert both models ensuring these four tokens are properly marked as CONTROL tokens? This should completely fix the visible [/INST] issue and make the models work correctly.
Thanks for your expertise!
Best regards,
Heikki
I already requested @mradermacher to update to latest llama.cpp for some other reasons so we can requeue it once he has updated. I assume on latest llama.cpp this should convert without this issue? Our current version is around 1 week old. If there is something special, I have to do during HF to GGUF conversion please give me instructions like what command line arguments to use. Editing the GGUF after convearsion is not realy an option for us.
Hi! Thanks for the quick response and willingness to help.
The issue specifically affects models based on Ahma-3B-Instruct where special tokens aren't always detected properly.
Try adding this parameter during conversion:
--special-tokens '[INST],[/INST],<<SYS>>,</SYS>>'
The key issue is that token IDs 3, 4, 5, and 6 need to be marked as CONTROL tokens instead of NORMAL tokens.
Thanks again for your help!
Best, Heikki
And I also meant to say that I doubt merely updating llama will suffice. It might but it would be better to make sure by using the approach above.
Best, Heikki
I think this should be better fixed on the original model side of things, since this will likely affect other programs too. Not stopping nicoboss to supply source ggufs, of course, his effort are always appreciated :)
Thanks for the reply. The underlying problem seems to be that llama looks for these special tokens in the wrong place. They're defined in the tokenizer config, which is OK practice, but llama doesn't pick up them from there and would need some manual help for this.
Best, Heikki
In that case, I retract what I said and claim the opposite, namely this should be fixed in llama.cpp, and nicoboss's heroic efforts are still appreciated :) If transformers work correctly with the model, it's a llama.cpp bug.
Yes, indeed, Transformers works just ok. This problem with llama was already noted in the community discussion for Ahma 3b Instruct (way back ago), which is the base for these models.
I will manually provide the GGUFs in this case. I assume the only change is specifying --special-tokens '[INST],[/INST],<<SYS>>,</SYS>>' when calling convert_hf_to_gguf.py
Thank you! I think this should work but I'm only relying on the community discussion where someone found this working, so I'm not 100% sure. If possible, I'd advice to test that the conversation works. But at least the models are fine as they work well with Transformers.
Pls also note that there are two models where you earlier did the conversion, the original ducklingcodehouse/Finnish-DentalQA-merged and the improved ducklingcodehouse/Finnish-DentalQA-v2-merged
Thank you again! 🙏
Best, Heikki
FYI @nicoboss the original Ahma conversion discussion is here: https://huggingface.co/mradermacher/Ahma-3B-Instruct-GGUF/discussions/1
Just wanted to mention that if the --special-tokens parameter doesn't work as expected during conversion, there's also a post-conversion solution that was proven to work in the original Ahma discussion.
You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:
- Load the converted GGUF file
- Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)
- Change their type from NORMAL to CONTROL
- Download the corrected file
This was the solution that @osma verified actually works in the Ahma discussion thread. So if we run into any issues with the conversion parameter approach, this could be a reliable fallback.
Thanks again for all your help!
Best, Heikki
convert_hf_to_gguf.py unfortinately has no argument --special-tokens:
root@AI:/apool/llama.cpp# venv/bin/python convert_hf_to_gguf.py --special-tokens '[INST],[/INST],<<SYS>>,</SYS>>' --outtype=bf16 --outfile=/mradermacher/tmp/quant/Finnish-DentalQA-merged.gguf /transfer/Finnish-DentalQA-merged/
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}] [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose] [--split-max-tensors SPLIT_MAX_TENSORS]
[--split-max-size SPLIT_MAX_SIZE] [--dry-run] [--no-tensor-first-split] [--metadata METADATA] [--print-supported-models] [--remote] [--mmproj] [--mistral-format] [--disable-mistral-community-chat-template]
[model]
convert_hf_to_gguf.py: error: unrecognized arguments: --special-tokens /transfer/Finnish-DentalQA-merged/
I guess I really have to use gguf-editor which is quite some afford but you seem to be so passionate about this model that is something I'm willing to do.
Oh, sorry to hear! Well I wasn't really sure here. But many thanks for this! Cheers! 🙏💪🔥
Best, Heikki
You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:
- Load the converted GGUF file
- Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)
- Change their type from NORMAL to CONTROL
- Download the corrected file
I just followed your guide. I hope I correctly changed everything as desired:
You can use the GGUF editor at https://huggingface.co/spaces/CISCai/gguf-editor to manually change the token types after conversion:
- Load the converted GGUF file
- Find tokens with IDs 3, 4, 5, and 6 (the INST and SYS tokens)
- Change their type from NORMAL to CONTROL
- Download the corrected file
I just followed your guide. I hope I correctly changed everything as desired:
It seems promising, but did you try it out? Can I try it out? Didn't see the GGUF models anymore on HuggingFace.
I just tried the new model out (assuming you uploaded the new one). I think it still doesn't behave very well. At this point, maybe I'd just recommend dropping the GGUF models for this ducklingcodehouse/Finnish-DentalQA-v2-merged and the earlier v1 (ducklingcodehouse/Finnish-DentalQA-merged) as I think it's perhaps gotten a bit too hard and would need maybe more investigation. I was just hoping that based on the earlier discussion with Ahma 3b Instruct there might have been a fix here. Not being a GGUF conversion expert myself I just based my advice on that discussion but perhaps it's not sufficient. Anyway, I thank you warmly for your effort.
Best, Heikki
@ducklingcodehouse Please redownload our latest quants and try again. I accidentally queued the model to marco instead of nico1 which caused the system to regenerate the source GGUF instead of taking the manually edited source GGUF.
Download page: https://hf.tst.eu/model#Finnish-DentalQA-v2-merged-GGUF
Static quants: https://huggingface.co/mradermacher/Finnish-DentalQA-v2-merged-GGUF
Weighted/imatrix quants: https://huggingface.co/mradermacher/Finnish-DentalQA-v2-merged-i1-GGUF
Thanks but still it behaves very weirdly. I think it has inherited this from the underlying Ahma model, which already had problems with GGUF. I've tried this now multiple different ways but really no luck. Sometimes it's better, sometimes worse, but always at least a bit off.
I'd recommend we just content ourselves with the Transformers versions of these Finnish DentalQA models. Thanks for your effort anyway!
Best, Heikki
