これはshisa-ai/shisa-v2.1-llama3.2-3bのGGUF量子化版です。
This is a GGUF quantized version of shisa-ai/shisa-v2.1-llama3.2-3b.

特徴/Features

一言で言えば沢山の細かい改善をして出来上がった強力な量子化モデルです。
In short, it's a powerful quantized model with many small improvements.

このggufの特徴

コミュニティが過去に発見したllama3.2の設定に関するパッチを適用して誤作動割合を減らしています
UnslothのDynamic 2.0 GGUF quantization手法を踏襲し、高い圧縮率を維持しつつ性能劣化を抑止しています
imatrix作成時に日本語が大目のデータを使用し、日本語性能の劣化を抑止しています

Features of this gguf

We've applied a patch to reduce the rate of malfunctions related to Qwen3 settings that were previously discovered by the community.
It follows Unsloth's Dynamic 2.0 GGUF quantization method, maintaining high compression ratios while minimizing performance degradation.
When creating the imatrix, Japanese uses a larger amount of data to prevent degradation of Japanese performance.

動かし方 / How to Run

llama.cppからお使いのハードウェア用のパッケージをダウンロードして設定します。
Ollama、LM Studioなどのggufファイルに対応しているツールなら同様に動かす事ができます。

Download the package for your hardware from llama.cpp and set it up.
Tools that support gguf files, such as Ollama and LM Studio, can also be used.

Linuxでのコマンドの実行例です
Here is an example of running the command on Linux:

./llama-cli -hf dahara1/shisa-v2.1-llama3.2-3b-UD-japanese-imatrix:Llama-3.2-3B-Instruct-UD-Q4_K_XL.gguf --ctx-size 8192 --temp 0.6 --top-p 0.9

推奨モデルはLlama-3.2-3B-Instruct-UD-Q4_K_XL.ggufですが、お使いのパソコンのメモリ量に合わせて、適切な大きさのモデルを選んでください
The recommended model is Llama-3.2-3B-Instruct-UD-Q4_K_XL.gguf, but please choose a model of the appropriate size based on the amount of memory in your computer.

サンプルスクリプト / sample script

クライアント/サーバー型式でスクリプトでアクセスしたい場合は以下を参考にしてください
If you want to access it via script in a client/server format, please refer to the following:

llama-server Commandの例

./llama-server -hf dahara1/shisa-v2.1-llama3.2-3b-UD-japanese-imatrix:Llama-3.2-3B-Instruct-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 8080 --ctx-size 8192 --temp 0.6 --top-p 0.9

ブラウザで、モデルを実行しているサーバーのローカルアドレス、ポートを指定して開いて下さい。例(http://127.0.0.1:8080/)
In your browser, open the local address and port of the server running the model. For example, http://127.0.0.1:8080/

client script

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="dummy"  #
)

response = client.chat.completions.create(
    model="shisa-v2.1-llama3.2-3b-UD-japanese-imatrix",
    messages=[
        {"role": "system", "content": "あなたは親切でなアシスタントです。ファンタジー設定でエルフの王女としてロールプレイをしてください"},
        {"role": "user", "content": "こんにちは！"}
    ],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

出力例

*優雅に微笑みながら、銀色の髪を軽く振りながら*
「こんにちは。私はセリエナ、エルフの王女です。森の囁きに耳を傾け、星の光を愛する者ですよ。今日はどんなお話をお聞きになりたいですか？ それとも、森の奥地で迷子になった冒険者様にお手伝いしましょうか？」

*杖（実は実体はなく、光る木の杖）を優しく握りながら、周囲の風に耳を傾けています。*

「無理に話す必要はありません。ただ、心の奥底に眠る物語を紡いでみませんか？ それとも、エルフの伝説や、古代の魔法について知りたいですか？」

（どんな質問でも大丈夫です！ 例えば…「森の危機について教えて」や「魔法の杖について知りたい」など、好きなようにお答えします）

ベンチマーク結果/benchmark result

本リポジトリのモデルとmradermacher(量子化技術で有名な人)が作成した量子化モデルの比較です
This is a comparison of the model from this repository, and the quantized model created by mradermacher (famous for his quantization techniques).

1. M-IFEval (JA) (Instruction Following)

Metric	UD-Q4_K_XL(dahara1)	Q4_K_M(mradermacher)
Prompt Level (Strict)	34.88%	33.72%
Instruction Level (Strict)	39.38%	39.38%
Prompt Level (Loose)	39.53%	35.47%
Instruction Level (Loose)	43.36%	41.59%

2. LiveBench (General Capabilities)

Category	UD-Q4_K_XL	Q4_K_M
Global Average	21.1	20.0
Reasoning	14.8	19.6
Language	12.3	3.7
Math	15.5	12.8
Data Analysis	19.9	21.1
Instruction Following	42.9	42.9

まとめ/Summary

UD-Q4_K_XL モデルは、特に日本語のプロンプトレベルの指示追従（M-IFEval (JA)）と言語タスク（LiveBench）において、全体的な安定性が優れています。
Q4_K_M モデルは、純粋な推論タスクにおいて優れたパフォーマンスを発揮します。
The UD-Q4_K_XL model shows better overall stability, particularly in Prompt-level instruction following (M-IFEval (JA)) and Language tasks (LiveBench).
The Q4_K_M model performs better in pure Reasoning tasks but shows significant degradation in Language scores.

謝辞 / Acknowledgments

meta
Shisa-ai
Unsloth
mradermacher
llama.cpp
Thank you to all AI researchers and practitioners

Downloads last month: 6,048

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dahara1/shisa-v2.1-llama3.2-3b-UD-japanese-imatrix

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

shisa-ai/shisa-v2.1-llama3.2-3b

Quantized

(5)

this model