Question about tokenizers
#5
by
sometimesanotion
- opened
This is a fascinating model with its multilingual applications. How does its vocabulary size affect tokenization and training at its size? Would the multilingual capabilities be enhanced with a tokenizer like Tekken, especially to generate fewer tokens?
Would a tokenizer switch, a bit of TokenSurgeon, and a finetune enhance this model's multilingual scope?
Hey @sometimesanotion ! A broader vocabulary would help strengthen multilingual capabilities, but it's also a tradeoff because it consumes more useful parameters. For such a small model, this would be extremely costly.