ๆฅๆฌ่ชใขใใซใซใผใ/Japanese model card
ๆฅๆฌ่ชใฎใใญใฐ/Full Japanese dev blog
Development source code/้็บใฝใผในใณใผใ
Karasu-DPO-7B
This is a Japanese version of the Qwen/Qwen2.5-7B-Instruct model which was DPO trained using synthetic Japanese conversation data.
This model outperforms the base Qwen/Qwen2.5-7B-Instruct model on the arena-hard-auto-multilingual chat benchmark:
| Qwen2.5-7B-Instruct | Karasu-DPO-7B |
|---|---|
| 50.0 | 66.2 |
We recommend this model for use as a general conversation AI.
How to use
This model can be used in the same way as any Qwen 2.5 model. We recommend using vLLM for simplicity and speed.
- vLLM
Install vLLM using
pip install vllm.Show vLLM code
from vllm import LLM, SamplingParams llm = LLM( model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese", max_model_len=8_000 ) sampling_params = SamplingParams( temperature=0.0, max_tokens=8_000, ) prompts = [ """ใใคใธใงใชใขใฎ้ฆ้ฝใฏใฉใใงใใ๏ผ""", """้ใฏไฝๅบฆใซๆบถใใพใใ๏ผ""", """็ถใๅฅฝใใใใชใใฌใผใณใใฎใใใใใๆใใฆ""", ] conversations = [ [{"role": "user", "content": x}] for x in prompts ] outputs = llm.chat(conversations, sampling_params=sampling_params) for output in outputs: print(output.outputs[0].text) print("-"*32) # ใใคใธใงใชใขใฎ้ฆ้ฝใฏใขใใธใฃ๏ผAbuja๏ผใงใใไปฅๅใฏใฉใดในใ้ฆ้ฝใงใใใใ1991ๅนดใซๆฐใใ้ฆ้ฝใจใใฆใขใใธใฃใๅปบ่จญใใใ1991ๅนด12ๆ12ๆฅใซ้ฆ้ฝใจใใฆใฎๅฐไฝใๆญฃๅผใซๅๅพใใพใใใใขใใธใฃใฏๆฟๆฒปไธญๅฟๅฐใจใใฆๆฉ่ฝใใฆใใพใใใ็ตๆธใฎไธญๅฟๅฐใฏไพ็ถใจใใฆใฉใดในใๅ ใใฆใใพใใ # -------------------------------- # ้ใฏ้ๅธธใซ้ซใๆธฉๅบฆใงๆบถใใพใใ้ใฎ่็นใฏ็ด1,538โ๏ผ2,800ยฐF๏ผใงใใใใใฏใไธ่ฌ็ใชๅฎถๅบญ็จใฎใชใผใใณ๏ผๆๅคง็ด200-300โ๏ผใงใฏ็ตถๅฏพใซ้ๆใงใใพใใใ้ใๆบถใใใใใซใฏใใใ้ซๆธฉใฎ่จญๅใๅฟ ่ฆใงใไพใใฐใ้ปๆฐ็ใใฌใน็ใชใฉใใใใพใใ # -------------------------------- # ใใกใใใงใใ็ถใใใธใฎใใฌใผใณใ้ธใณใฏๆฅฝใใฟใงใใญใไปฅไธใซใ็ถใๅใถ2ใคใฎใใฌใผใณใใๆๆกใใพใ๏ผ # 1. **้ซ็ดใณใผใใผใกใผใซใผ**๏ผ # - ็ถใใใใณใผใใผใๆ้ฃฒใใฆใใใชใใ้ซๅ่ณชใชใณใผใใผใกใผใซใผใฏๅคงๅคๅใฐใใใใฌใผใณใใงใใไพใใฐใๆๅๅผใฎใณใผใใผใกใผใซใผใชใใๆฏๆฅใฎใณใผใใผไฝใใใใๆฅฝใใใๆไฝใๆใๆฅฝใใใพใใใพใใ่ชๅๅผใฎใณใผใใผใกใผใซใผใชใใๅฟใใๆใงใ็พๅณใใใณใผใใผใๆฅฝใใใพใใ # 2. **่ถฃๅณใซๅใใใใฎใใใปใใ**๏ผ # - ็ถใใใฎ่ถฃๅณใ่ๅณใซๅใใใใฎใใใปใใใฏใใจใฆใๅใฐใใพใใไพใใฐใใดใซใๅฅฝใใชใใๆๆฐใฎใดใซใใฏใฉใใใดใซใใใใฐใใดใซใใใผใซใปใใใชใฉใ่ฏใใงใใใใใพใใ่ปๅฅฝใใชใใ้ซๅ่ณชใช่ป็จใขใฏใปใตใชใผ๏ผใซใผใใฃใซใ ใใซใผใใณใทใผใใชใฉ๏ผใ่ป่ผ็จใฎๅ ้ปๅจใชใฉใๅใฐใใพใใ # ใใใใฎใใฌใผใณใใฏใ็ถใใใฎ่ถฃๅณใ่ๅณใซๅใใใฆ้ธในใฐใใใฃใจๅใใงใใใใใใจใงใใใใ # --------------------------------
How this model was made
We made this model through the following procedure:
- Sample Japanese and English prompts from the following datasets:
- lmsys/lmsys-chat-1m
- RyokoAI/ShareGPT52K
- openchat/openchat_sharegpt_v3
- OpenAssistant/oasst2
- Open-Orca/slimorca-deduped-cleaned-corrected
- HuggingFaceH4/ultrachat_200k
- Translate English prompts to Japanese using gpt-4o-mini.
- Correct translations with gpt-4o-mini.
- Get responses to all Japanese prompts (both original and translated) with gpt-4o-mini.
- Correct responses using gpt-4o-mini.
We QLoRA DPO trained a Qwen/Qwen2.5-7B-Instruct model on this data to create Karasu-DPO-7B.
ๆฅๆฌ่ช
ใใกใใฎใขใใซใฏQwen/Qwen2.5-7B-Instructใฎๆฅๆฌ่ช็ใงใใ็ๆใใๆฅๆฌ่ชไผ่ฉฑใใผใฟใจDPOๅญฆ็ฟใงไฝๆใใพใใใ
ใใฎใขใใซใฏใarena-hard-auto-multilingualใใฃใใใใณใใใผใฏใซใใใฆใใใผในใขใใซใงใใQwen/Qwen2.5-7B-Instructใไธๅใๆง่ฝใ็บๆฎใใพใ๏ผ
Qwen2.5-7B-Instruct Karasu-DPO-7B 50.0 66.2 ใใฎใขใใซใฏใไธ่ฌ็ใชไผ่ฉฑAIใจใใฆใฎไฝฟ็จใๆจๅฅจใใพใใ
ไฝฟ็จๆนๆณ
ใใฎใขใใซใฏใไปใฎQwen 2.5ใขใใซใจๅๆงใฎๆนๆณใงไฝฟ็จใงใใพใใใทใณใใซใง้ซ้ใชๆไฝใฎใใใซใฏvLLMใฎไฝฟ็จใๆจๅฅจใใพใใ
- vLLM
vLLMใ
pip install vllmใงใคใณในใใผใซใใฆใใ ใใใvLLMใณใผใใ่ฆใ
from vllm import LLM, SamplingParams llm = LLM( model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese", max_model_len=8_000 ) sampling_params = SamplingParams( temperature=0.0, max_tokens=8_000, ) prompts = [ """ใใคใธใงใชใขใฎ้ฆ้ฝใฏใฉใใงใใ๏ผ""", """้ใฏไฝๅบฆใซๆบถใใพใใ๏ผ""", """็ถใๅฅฝใใใใชใใฌใผใณใใฎใใใใใๆใใฆ""", ] conversations = [ [{"role": "user", "content": x}] for x in prompts ] outputs = llm.chat(conversations, sampling_params=sampling_params) for output in outputs: print(output.outputs[0].text) print("-"*32) # ใใคใธใงใชใขใฎ้ฆ้ฝใฏใขใใธใฃ๏ผAbuja๏ผใงใใไปฅๅใฏใฉใดในใ้ฆ้ฝใงใใใใ1991ๅนดใซๆฐใใ้ฆ้ฝใจใใฆใขใใธใฃใๅปบ่จญใใใ1991ๅนด12ๆ12ๆฅใซ้ฆ้ฝใจใใฆใฎๅฐไฝใๆญฃๅผใซๅๅพใใพใใใใขใใธใฃใฏๆฟๆฒปไธญๅฟๅฐใจใใฆๆฉ่ฝใใฆใใพใใใ็ตๆธใฎไธญๅฟๅฐใฏไพ็ถใจใใฆใฉใดในใๅ ใใฆใใพใใ # -------------------------------- # ้ใฏ้ๅธธใซ้ซใๆธฉๅบฆใงๆบถใใพใใ้ใฎ่็นใฏ็ด1,538โ๏ผ2,800ยฐF๏ผใงใใใใใฏใไธ่ฌ็ใชๅฎถๅบญ็จใฎใชใผใใณ๏ผๆๅคง็ด200-300โ๏ผใงใฏ็ตถๅฏพใซ้ๆใงใใพใใใ้ใๆบถใใใใใซใฏใใใ้ซๆธฉใฎ่จญๅใๅฟ ่ฆใงใไพใใฐใ้ปๆฐ็ใใฌใน็ใชใฉใใใใพใใ # -------------------------------- # ใใกใใใงใใ็ถใใใธใฎใใฌใผใณใ้ธใณใฏๆฅฝใใฟใงใใญใไปฅไธใซใ็ถใๅใถ2ใคใฎใใฌใผใณใใๆๆกใใพใ๏ผ # 1. **้ซ็ดใณใผใใผใกใผใซใผ**๏ผ # - ็ถใใใใณใผใใผใๆ้ฃฒใใฆใใใชใใ้ซๅ่ณชใชใณใผใใผใกใผใซใผใฏๅคงๅคๅใฐใใใใฌใผใณใใงใใไพใใฐใๆๅๅผใฎใณใผใใผใกใผใซใผใชใใๆฏๆฅใฎใณใผใใผไฝใใใใๆฅฝใใใๆไฝใๆใๆฅฝใใใพใใใพใใ่ชๅๅผใฎใณใผใใผใกใผใซใผใชใใๅฟใใๆใงใ็พๅณใใใณใผใใผใๆฅฝใใใพใใ # 2. **่ถฃๅณใซๅใใใใฎใใใปใใ**๏ผ # - ็ถใใใฎ่ถฃๅณใ่ๅณใซๅใใใใฎใใใปใใใฏใใจใฆใๅใฐใใพใใไพใใฐใใดใซใๅฅฝใใชใใๆๆฐใฎใดใซใใฏใฉใใใดใซใใใใฐใใดใซใใใผใซใปใใใชใฉใ่ฏใใงใใใใใพใใ่ปๅฅฝใใชใใ้ซๅ่ณชใช่ป็จใขใฏใปใตใชใผ๏ผใซใผใใฃใซใ ใใซใผใใณใทใผใใชใฉ๏ผใ่ป่ผ็จใฎๅ ้ปๅจใชใฉใๅใฐใใพใใ # ใใใใฎใใฌใผใณใใฏใ็ถใใใฎ่ถฃๅณใ่ๅณใซๅใใใฆ้ธในใฐใใใฃใจๅใใงใใใใใใจใงใใใใ # --------------------------------
ใใฎใขใใซใฎไฝๆๆนๆณ
ใใฎใขใใซใฏไปฅไธใฎๆ้ ใ้ใใฆไฝๆใใใพใใ๏ผ
- ไปฅไธใฎใใผใฟใปใใใใๆฅๆฌ่ชใใใณ่ฑ่ชใฎใใญใณใใใใตใณใใชใณใฐ๏ผ
- lmsys/lmsys-chat-1m
- RyokoAI/ShareGPT52K
- openchat/openchat_sharegpt_v3
- OpenAssistant/oasst2
- Open-Orca/slimorca-deduped-cleaned-corrected
- HuggingFaceH4/ultrachat_200k
- ่ฑ่ชใฎใใญใณใใใgpt-4o-miniใไฝฟใฃใฆๆฅๆฌ่ชใซ็ฟป่จณใ
- gpt-4o-miniใไฝฟใฃใฆ็ฟป่จณใไฟฎๆญฃใ
- ๆฅๆฌ่ชใฎใใญใณใใ๏ผใชใชใธใใซใจ็ฟป่จณใฎไธกๆน๏ผใซๅฏพใใๅฟ็ญใgpt-4o-miniใงๅๅพใ
- gpt-4o-miniใไฝฟ็จใใฆๅฟ็ญใไฟฎๆญฃใ
Qwen/Qwen2.5-7B-InstructใขใใซใๅบใซใQLoRA DPOใใฌใผใใณใฐใ่กใใKarasu-DPO-7Bใไฝๆใใพใใใ
Model Details
- Model size: 7B
- Context length: 1024
- Language: Japanese
Training Procudure
- learning_rate: 5e-6
- train_batch_size: 4
- eval_batch_size: 2
- gradient_accumulation_steps: 4
- lr_scheduler_type: cosine
Training Results
Step Traning Loss Validation Loss 10 0.678400 0.665870 20 0.608500 0.638361 30 0.577300 0.607468 40 0.526700 0.559432 50 0.489200 0.523419 60 0.502800 0.511645 70 0.462300 0.506989 80 0.419600 0.509142 90 0.445200 0.510396 100 0.424400 0.511653 License
We share this model under an Apache 2.0 license.
Developed by
This model was trained by Jun Sashihara (junsashihara) and supervised by Peter Devine (ptrdvn) for Lightblueใ
- ไปฅไธใฎใใผใฟใปใใใใๆฅๆฌ่ชใใใณ่ฑ่ชใฎใใญใณใใใใตใณใใชใณใฐ๏ผ
- Sample Japanese and English prompts from the following datasets:
- Downloads last month
- 15
Model tree for lightblue/Karasu-DPO-7B
Base model
Qwen/Qwen2.5-7B