ๆ—ฅๆœฌ่ชžใƒขใƒ‡ใƒซใ‚ซใƒผใƒ‰/Japanese model card

ๆ—ฅๆœฌ่ชžใฎใƒ–ใƒญใ‚ฐ/Full Japanese dev blog

Development source code/้–‹็™บใ‚ฝใƒผใ‚นใ‚ณใƒผใƒ‰

Karasu-DPO-7B

This is a Japanese version of the Qwen/Qwen2.5-7B-Instruct model which was DPO trained using synthetic Japanese conversation data.

This model outperforms the base Qwen/Qwen2.5-7B-Instruct model on the arena-hard-auto-multilingual chat benchmark:

Qwen2.5-7B-Instruct Karasu-DPO-7B
50.0 66.2

We recommend this model for use as a general conversation AI.

How to use

This model can be used in the same way as any Qwen 2.5 model. We recommend using vLLM for simplicity and speed.

  • vLLM

    Install vLLM using pip install vllm.

    Show vLLM code
    from vllm import LLM, SamplingParams
    
    llm = LLM(
        model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
        max_model_len=8_000
    )
    
    sampling_params = SamplingParams(
        temperature=0.0, 
        max_tokens=8_000,
    )
    
    prompts = [
        """ใƒŠใ‚คใ‚ธใ‚งใƒชใ‚ขใฎ้ฆ–้ƒฝใฏใฉใ“ใงใ™ใ‹๏ผŸ""",
        """้‰„ใฏไฝ•ๅบฆใซๆบถใ‘ใพใ™ใ‹๏ผŸ""",
        """็ˆถใŒๅฅฝใใใ†ใชใƒ—ใƒฌใ‚ผใƒณใƒˆใฎใŠใ™ใ™ใ‚ใ‚’ๆ•™ใˆใฆ""",
    ]
    
    conversations = [
        [{"role": "user", "content": x}] for x in prompts
    ]
    
    outputs = llm.chat(conversations, sampling_params=sampling_params)
    
    for output in outputs:
        print(output.outputs[0].text)
        print("-"*32)
    
    # ใƒŠใ‚คใ‚ธใ‚งใƒชใ‚ขใฎ้ฆ–้ƒฝใฏใ‚ขใƒ–ใ‚ธใƒฃ๏ผˆAbuja๏ผ‰ใงใ™ใ€‚ไปฅๅ‰ใฏใƒฉใ‚ดใ‚นใŒ้ฆ–้ƒฝใงใ—ใŸใŒใ€1991ๅนดใซๆ–ฐใ—ใ„้ฆ–้ƒฝใจใ—ใฆใ‚ขใƒ–ใ‚ธใƒฃใŒๅปบ่จญใ•ใ‚Œใ€1991ๅนด12ๆœˆ12ๆ—ฅใซ้ฆ–้ƒฝใจใ—ใฆใฎๅœฐไฝใ‚’ๆญฃๅผใซๅ–ๅพ—ใ—ใพใ—ใŸใ€‚ใ‚ขใƒ–ใ‚ธใƒฃใฏๆ”ฟๆฒปไธญๅฟƒๅœฐใจใ—ใฆๆฉŸ่ƒฝใ—ใฆใ„ใพใ™ใŒใ€็ตŒๆธˆใฎไธญๅฟƒๅœฐใฏไพ็„ถใจใ—ใฆใƒฉใ‚ดใ‚นใŒๅ ใ‚ใฆใ„ใพใ™ใ€‚
    # --------------------------------
    # ้‰„ใฏ้žๅธธใซ้ซ˜ใ„ๆธฉๅบฆใงๆบถใ‘ใพใ™ใ€‚้‰„ใฎ่ž็‚นใฏ็ด„1,538โ„ƒ๏ผˆ2,800ยฐF๏ผ‰ใงใ™ใ€‚ใ“ใ‚Œใฏใ€ไธ€่ˆฌ็š„ใชๅฎถๅบญ็”จใฎใ‚ชใƒผใƒ–ใƒณ๏ผˆๆœ€ๅคง็ด„200-300โ„ƒ๏ผ‰ใงใฏ็ตถๅฏพใซ้”ๆˆใงใใพใ›ใ‚“ใ€‚้‰„ใ‚’ๆบถใ‹ใ™ใŸใ‚ใซใฏใ€ใ‚ˆใ‚Š้ซ˜ๆธฉใฎ่จญๅ‚™ใŒๅฟ…่ฆใงใ€ไพ‹ใˆใฐใ€้›ปๆฐ—็‚‰ใ‚„ใ‚ฌใ‚น็‚‰ใชใฉใŒใ‚ใ‚Šใพใ™ใ€‚
    # --------------------------------
    # ใ‚‚ใกใ‚ใ‚“ใงใ™ใ€‚็ˆถใ•ใ‚“ใธใฎใƒ—ใƒฌใ‚ผใƒณใƒˆ้ธใณใฏๆฅฝใ—ใฟใงใ™ใญใ€‚ไปฅไธ‹ใซใ€็ˆถใŒๅ–œใถ2ใคใฎใƒ—ใƒฌใ‚ผใƒณใƒˆใ‚’ๆๆกˆใ—ใพใ™๏ผš
    
    # 1. **้ซ˜็ดšใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผ**๏ผš
    #    - ็ˆถใ•ใ‚“ใŒใ‚ณใƒผใƒ’ใƒผใ‚’ๆ„›้ฃฒใ—ใฆใ„ใ‚‹ใชใ‚‰ใ€้ซ˜ๅ“่ณชใชใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใฏๅคงๅค‰ๅ–œใฐใ‚Œใ‚‹ใƒ—ใƒฌใ‚ผใƒณใƒˆใงใ™ใ€‚ไพ‹ใˆใฐใ€ๆ‰‹ๅ‹•ๅผใฎใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใชใ‚‰ใ€ๆฏŽๆ—ฅใฎใ‚ณใƒผใƒ’ใƒผไฝœใ‚ŠใŒใ‚ˆใ‚Šๆฅฝใ—ใใ€ๆ‰‹ไฝœใ‚Šๆ„Ÿใ‚‚ๆฅฝใ—ใ‚ใพใ™ใ€‚ใพใŸใ€่‡ชๅ‹•ๅผใฎใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใชใ‚‰ใ€ๅฟ™ใ—ใ„ๆœใงใ‚‚็พŽๅ‘ณใ—ใ„ใ‚ณใƒผใƒ’ใƒผใŒๆฅฝใ—ใ‚ใพใ™ใ€‚
    
    # 2. **่ถฃๅ‘ณใซๅˆใ‚ใ›ใŸใ‚ฎใƒ•ใƒˆใ‚ปใƒƒใƒˆ**๏ผš
    #    - ็ˆถใ•ใ‚“ใฎ่ถฃๅ‘ณใ‚„่ˆˆๅ‘ณใซๅˆใ‚ใ›ใŸใ‚ฎใƒ•ใƒˆใ‚ปใƒƒใƒˆใฏใ€ใจใฆใ‚‚ๅ–œใฐใ‚Œใพใ™ใ€‚ไพ‹ใˆใฐใ€ใ‚ดใƒซใƒ•ๅฅฝใใชใ‚‰ใ€ๆœ€ๆ–ฐใฎใ‚ดใƒซใƒ•ใ‚ฏใƒฉใƒ–ใ‚„ใ‚ดใƒซใƒ•ใƒใƒƒใ‚ฐใ€ใ‚ดใƒซใƒ•ใƒœใƒผใƒซใ‚ปใƒƒใƒˆใชใฉใŒ่‰ฏใ„ใงใ—ใ‚‡ใ†ใ€‚ใพใŸใ€่ปŠๅฅฝใใชใ‚‰ใ€้ซ˜ๅ“่ณชใช่ปŠ็”จใ‚ขใ‚ฏใ‚ปใ‚ตใƒชใƒผ๏ผˆใ‚ซใƒผใƒ•ใ‚ฃใƒซใƒ ใ€ใ‚ซใƒผใƒœใƒณใ‚ทใƒผใƒˆใชใฉ๏ผ‰ใ‚„่ปŠ่ผ‰็”จใฎๅ……้›ปๅ™จใชใฉใŒๅ–œใฐใ‚Œใพใ™ใ€‚
    
    # ใ“ใ‚Œใ‚‰ใฎใƒ—ใƒฌใ‚ผใƒณใƒˆใฏใ€็ˆถใ•ใ‚“ใฎ่ถฃๅ‘ณใ‚„่ˆˆๅ‘ณใซๅˆใ‚ใ›ใฆ้ธในใฐใ€ใใฃใจๅ–œใ‚“ใงใ‚‚ใ‚‰ใˆใ‚‹ใ“ใจใงใ—ใ‚‡ใ†ใ€‚
    # --------------------------------
    

    How this model was made

    We made this model through the following procedure:

    1. Sample Japanese and English prompts from the following datasets:
      • lmsys/lmsys-chat-1m
      • RyokoAI/ShareGPT52K
      • openchat/openchat_sharegpt_v3
      • OpenAssistant/oasst2
      • Open-Orca/slimorca-deduped-cleaned-corrected
      • HuggingFaceH4/ultrachat_200k
    2. Translate English prompts to Japanese using gpt-4o-mini.
    3. Correct translations with gpt-4o-mini.
    4. Get responses to all Japanese prompts (both original and translated) with gpt-4o-mini.
    5. Correct responses using gpt-4o-mini.

    We QLoRA DPO trained a Qwen/Qwen2.5-7B-Instruct model on this data to create Karasu-DPO-7B.

    ๆ—ฅๆœฌ่ชž

    ใ“ใกใ‚‰ใฎใƒขใƒ‡ใƒซใฏQwen/Qwen2.5-7B-Instructใฎๆ—ฅๆœฌ่ชž็‰ˆใงใ™ใ€‚็”Ÿๆˆใ—ใŸๆ—ฅๆœฌ่ชžไผš่ฉฑใƒ‡ใƒผใ‚ฟใจDPOๅญฆ็ฟ’ใงไฝœๆˆใ—ใพใ—ใŸใ€‚

    ใ“ใฎใƒขใƒ‡ใƒซใฏใ€arena-hard-auto-multilingualใƒใƒฃใƒƒใƒˆใƒ™ใƒณใƒใƒžใƒผใ‚ฏใซใŠใ„ใฆใ€ใƒ™ใƒผใ‚นใƒขใƒ‡ใƒซใงใ‚ใ‚‹Qwen/Qwen2.5-7B-Instructใ‚’ไธŠๅ›žใ‚‹ๆ€ง่ƒฝใ‚’็™บๆฎใ—ใพใ™๏ผš

    Qwen2.5-7B-Instruct Karasu-DPO-7B
    50.0 66.2

    ใ“ใฎใƒขใƒ‡ใƒซใฏใ€ไธ€่ˆฌ็š„ใชไผš่ฉฑAIใจใ—ใฆใฎไฝฟ็”จใ‚’ๆŽจๅฅจใ—ใพใ™ใ€‚

    ไฝฟ็”จๆ–นๆณ•

    ใ“ใฎใƒขใƒ‡ใƒซใฏใ€ไป–ใฎQwen 2.5ใƒขใƒ‡ใƒซใจๅŒๆง˜ใฎๆ–นๆณ•ใงไฝฟ็”จใงใใพใ™ใ€‚ใ‚ทใƒณใƒ—ใƒซใง้ซ˜้€Ÿใชๆ“ไฝœใฎใŸใ‚ใซใฏvLLMใฎไฝฟ็”จใ‚’ๆŽจๅฅจใ—ใพใ™ใ€‚

    • vLLM

      vLLMใ‚’pip install vllmใงใ‚คใƒณใ‚นใƒˆใƒผใƒซใ—ใฆใใ ใ•ใ„ใ€‚

      vLLMใ‚ณใƒผใƒ‰ใ‚’่ฆ‹ใ‚‹
      from vllm import LLM, SamplingParams
      
      llm = LLM(
          model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
          max_model_len=8_000
      )
      
      sampling_params = SamplingParams(
          temperature=0.0, 
          max_tokens=8_000,
      )
      
      prompts = [
          """ใƒŠใ‚คใ‚ธใ‚งใƒชใ‚ขใฎ้ฆ–้ƒฝใฏใฉใ“ใงใ™ใ‹๏ผŸ""",
          """้‰„ใฏไฝ•ๅบฆใซๆบถใ‘ใพใ™ใ‹๏ผŸ""",
          """็ˆถใŒๅฅฝใใใ†ใชใƒ—ใƒฌใ‚ผใƒณใƒˆใฎใŠใ™ใ™ใ‚ใ‚’ๆ•™ใˆใฆ""",
      ]
      
      conversations = [
          [{"role": "user", "content": x}] for x in prompts
      ]
      
      outputs = llm.chat(conversations, sampling_params=sampling_params)
      
      for output in outputs:
          print(output.outputs[0].text)
          print("-"*32)
      
      # ใƒŠใ‚คใ‚ธใ‚งใƒชใ‚ขใฎ้ฆ–้ƒฝใฏใ‚ขใƒ–ใ‚ธใƒฃ๏ผˆAbuja๏ผ‰ใงใ™ใ€‚ไปฅๅ‰ใฏใƒฉใ‚ดใ‚นใŒ้ฆ–้ƒฝใงใ—ใŸใŒใ€1991ๅนดใซๆ–ฐใ—ใ„้ฆ–้ƒฝใจใ—ใฆใ‚ขใƒ–ใ‚ธใƒฃใŒๅปบ่จญใ•ใ‚Œใ€1991ๅนด12ๆœˆ12ๆ—ฅใซ้ฆ–้ƒฝใจใ—ใฆใฎๅœฐไฝใ‚’ๆญฃๅผใซๅ–ๅพ—ใ—ใพใ—ใŸใ€‚ใ‚ขใƒ–ใ‚ธใƒฃใฏๆ”ฟๆฒปไธญๅฟƒๅœฐใจใ—ใฆๆฉŸ่ƒฝใ—ใฆใ„ใพใ™ใŒใ€็ตŒๆธˆใฎไธญๅฟƒๅœฐใฏไพ็„ถใจใ—ใฆใƒฉใ‚ดใ‚นใŒๅ ใ‚ใฆใ„ใพใ™ใ€‚
      # --------------------------------
      # ้‰„ใฏ้žๅธธใซ้ซ˜ใ„ๆธฉๅบฆใงๆบถใ‘ใพใ™ใ€‚้‰„ใฎ่ž็‚นใฏ็ด„1,538โ„ƒ๏ผˆ2,800ยฐF๏ผ‰ใงใ™ใ€‚ใ“ใ‚Œใฏใ€ไธ€่ˆฌ็š„ใชๅฎถๅบญ็”จใฎใ‚ชใƒผใƒ–ใƒณ๏ผˆๆœ€ๅคง็ด„200-300โ„ƒ๏ผ‰ใงใฏ็ตถๅฏพใซ้”ๆˆใงใใพใ›ใ‚“ใ€‚้‰„ใ‚’ๆบถใ‹ใ™ใŸใ‚ใซใฏใ€ใ‚ˆใ‚Š้ซ˜ๆธฉใฎ่จญๅ‚™ใŒๅฟ…่ฆใงใ€ไพ‹ใˆใฐใ€้›ปๆฐ—็‚‰ใ‚„ใ‚ฌใ‚น็‚‰ใชใฉใŒใ‚ใ‚Šใพใ™ใ€‚
      # --------------------------------
      # ใ‚‚ใกใ‚ใ‚“ใงใ™ใ€‚็ˆถใ•ใ‚“ใธใฎใƒ—ใƒฌใ‚ผใƒณใƒˆ้ธใณใฏๆฅฝใ—ใฟใงใ™ใญใ€‚ไปฅไธ‹ใซใ€็ˆถใŒๅ–œใถ2ใคใฎใƒ—ใƒฌใ‚ผใƒณใƒˆใ‚’ๆๆกˆใ—ใพใ™๏ผš
      
      # 1. **้ซ˜็ดšใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผ**๏ผš
      #    - ็ˆถใ•ใ‚“ใŒใ‚ณใƒผใƒ’ใƒผใ‚’ๆ„›้ฃฒใ—ใฆใ„ใ‚‹ใชใ‚‰ใ€้ซ˜ๅ“่ณชใชใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใฏๅคงๅค‰ๅ–œใฐใ‚Œใ‚‹ใƒ—ใƒฌใ‚ผใƒณใƒˆใงใ™ใ€‚ไพ‹ใˆใฐใ€ๆ‰‹ๅ‹•ๅผใฎใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใชใ‚‰ใ€ๆฏŽๆ—ฅใฎใ‚ณใƒผใƒ’ใƒผไฝœใ‚ŠใŒใ‚ˆใ‚Šๆฅฝใ—ใใ€ๆ‰‹ไฝœใ‚Šๆ„Ÿใ‚‚ๆฅฝใ—ใ‚ใพใ™ใ€‚ใพใŸใ€่‡ชๅ‹•ๅผใฎใ‚ณใƒผใƒ’ใƒผใƒกใƒผใ‚ซใƒผใชใ‚‰ใ€ๅฟ™ใ—ใ„ๆœใงใ‚‚็พŽๅ‘ณใ—ใ„ใ‚ณใƒผใƒ’ใƒผใŒๆฅฝใ—ใ‚ใพใ™ใ€‚
      
      # 2. **่ถฃๅ‘ณใซๅˆใ‚ใ›ใŸใ‚ฎใƒ•ใƒˆใ‚ปใƒƒใƒˆ**๏ผš
      #    - ็ˆถใ•ใ‚“ใฎ่ถฃๅ‘ณใ‚„่ˆˆๅ‘ณใซๅˆใ‚ใ›ใŸใ‚ฎใƒ•ใƒˆใ‚ปใƒƒใƒˆใฏใ€ใจใฆใ‚‚ๅ–œใฐใ‚Œใพใ™ใ€‚ไพ‹ใˆใฐใ€ใ‚ดใƒซใƒ•ๅฅฝใใชใ‚‰ใ€ๆœ€ๆ–ฐใฎใ‚ดใƒซใƒ•ใ‚ฏใƒฉใƒ–ใ‚„ใ‚ดใƒซใƒ•ใƒใƒƒใ‚ฐใ€ใ‚ดใƒซใƒ•ใƒœใƒผใƒซใ‚ปใƒƒใƒˆใชใฉใŒ่‰ฏใ„ใงใ—ใ‚‡ใ†ใ€‚ใพใŸใ€่ปŠๅฅฝใใชใ‚‰ใ€้ซ˜ๅ“่ณชใช่ปŠ็”จใ‚ขใ‚ฏใ‚ปใ‚ตใƒชใƒผ๏ผˆใ‚ซใƒผใƒ•ใ‚ฃใƒซใƒ ใ€ใ‚ซใƒผใƒœใƒณใ‚ทใƒผใƒˆใชใฉ๏ผ‰ใ‚„่ปŠ่ผ‰็”จใฎๅ……้›ปๅ™จใชใฉใŒๅ–œใฐใ‚Œใพใ™ใ€‚
      
      # ใ“ใ‚Œใ‚‰ใฎใƒ—ใƒฌใ‚ผใƒณใƒˆใฏใ€็ˆถใ•ใ‚“ใฎ่ถฃๅ‘ณใ‚„่ˆˆๅ‘ณใซๅˆใ‚ใ›ใฆ้ธในใฐใ€ใใฃใจๅ–œใ‚“ใงใ‚‚ใ‚‰ใˆใ‚‹ใ“ใจใงใ—ใ‚‡ใ†ใ€‚
      # --------------------------------
      

      ใ“ใฎใƒขใƒ‡ใƒซใฎไฝœๆˆๆ–นๆณ•

      ใ“ใฎใƒขใƒ‡ใƒซใฏไปฅไธ‹ใฎๆ‰‹้ †ใ‚’้€šใ—ใฆไฝœๆˆใ•ใ‚Œใพใ—ใŸ๏ผš

      1. ไปฅไธ‹ใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใ‹ใ‚‰ๆ—ฅๆœฌ่ชžใŠใ‚ˆใณ่‹ฑ่ชžใฎใƒ—ใƒญใƒณใƒ—ใƒˆใ‚’ใ‚ตใƒณใƒ—ใƒชใƒณใ‚ฐ๏ผš
        • lmsys/lmsys-chat-1m
        • RyokoAI/ShareGPT52K
        • openchat/openchat_sharegpt_v3
        • OpenAssistant/oasst2
        • Open-Orca/slimorca-deduped-cleaned-corrected
        • HuggingFaceH4/ultrachat_200k
      2. ่‹ฑ่ชžใฎใƒ—ใƒญใƒณใƒ—ใƒˆใ‚’gpt-4o-miniใ‚’ไฝฟใฃใฆๆ—ฅๆœฌ่ชžใซ็ฟป่จณใ€‚
      3. gpt-4o-miniใ‚’ไฝฟใฃใฆ็ฟป่จณใ‚’ไฟฎๆญฃใ€‚
      4. ๆ—ฅๆœฌ่ชžใฎใƒ—ใƒญใƒณใƒ—ใƒˆ๏ผˆใ‚ชใƒชใ‚ธใƒŠใƒซใจ็ฟป่จณใฎไธกๆ–น๏ผ‰ใซๅฏพใ™ใ‚‹ๅฟœ็ญ”ใ‚’gpt-4o-miniใงๅ–ๅพ—ใ€‚
      5. gpt-4o-miniใ‚’ไฝฟ็”จใ—ใฆๅฟœ็ญ”ใ‚’ไฟฎๆญฃใ€‚

      Qwen/Qwen2.5-7B-Instructใƒขใƒ‡ใƒซใ‚’ๅŸบใซใ€QLoRA DPOใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ‚’่กŒใ„ใ€Karasu-DPO-7Bใ‚’ไฝœๆˆใ—ใพใ—ใŸใ€‚

      Model Details

      • Model size: 7B
      • Context length: 1024
      • Language: Japanese

      Training Procudure

      • learning_rate: 5e-6
      • train_batch_size: 4
      • eval_batch_size: 2
      • gradient_accumulation_steps: 4
      • lr_scheduler_type: cosine

      Training Results

      Step Traning Loss Validation Loss
      10 0.678400 0.665870
      20 0.608500 0.638361
      30 0.577300 0.607468
      40 0.526700 0.559432
      50 0.489200 0.523419
      60 0.502800 0.511645
      70 0.462300 0.506989
      80 0.419600 0.509142
      90 0.445200 0.510396
      100 0.424400 0.511653

      License

      We share this model under an Apache 2.0 license.

      Developed by

      Lightblue technology logo

      This model was trained by Jun Sashihara (junsashihara) and supervised by Peter Devine (ptrdvn) for Lightblueใ€‚

Downloads last month
15
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lightblue/Karasu-DPO-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(3148)
this model
Merges
16 models
Quantizations
2 models

Dataset used to train lightblue/Karasu-DPO-7B

Spaces using lightblue/Karasu-DPO-7B 13