TAO71-AI Quants: Qwen3
Collection
16 items
•
Updated
| Quant | Size | Description |
|---|---|---|
| Q2_K_XXS | 595.71 MB | Not recommended for most people. Extremelly low quality. |
| Q2_K_XS | 595.71 MB | Not recommended for most people. Very low quality. |
| Q2_K | 595.71 MB | Not recommended for most people. Very low quality. |
| Q2_K_L | 813.63 MB | Not recommended for most people. Uses Q8_0 for output and embedding, and Q2_K for everything else. Very low quality. |
| Q2_K_XL | 1.07 GB | Not recommended for most people. Uses F16 for output and embedding, and Q2_K for everything else. Very low quality. |
| Q3_K_XXS | 711.16 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality. |
| Q3_K_XS | 711.16 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality. |
| Q3_K_S | 711.16 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Low quality. |
| Q3_K_M | 780.1 MB | Not recommended for most people. Low quality. |
| Q3_K_L | 841.1 MB | Not recommended for most people. Low quality. |
| Q3_K_XL | 1.0 GB | Not recommended for most people. Uses Q8_0 for output and embedding, and Q3_K_L for everything else. Low quality. |
| Q3_K_XXL | 1.28 GB | Not recommended for most people. Uses F16 for output and embedding, and Q3_K_L for everything else. Low quality. |
| Q4_K_XS | 934.57 MB | Lower quality than Q4_K_S. |
| Q4_K_S | 934.57 MB | Recommended. Slightly low quality. |
| Q4_K_M | 979.6 MB | Recommended. Decent quality for most use cases. |
| Q4_K_L | 1.1 GB | Recommended. Uses Q8_0 for output and embedding, and Q4_K_M for everything else. Decent quality. |
| Q4_K_XL | 1.37 GB | Recommended. Uses F16 for output and embedding, and Q4_K_M for everything else. Decent quality. |
| Q5_K_XXS | 1.11 GB | Lower quality than Q5_K_S. |
| Q5_K_XS | 1.11 GB | Lower quality than Q5_K_S. |
| Q5_K_S | 1.11 GB | Recommended. High quality. |
| Q5_K_M | 1.13 GB | Recommended. High quality. |
| Q5_K_L | 1.24 GB | Recommended. Uses Q8_0 for output and embedding, and Q5_K_M for everything else. High quality. |
| Q5_K_XL | 1.51 GB | Recommended. Uses F16 for output and embedding, and Q5_K_M for everything else. High quality. |
| Q6_K_S | 1.32 GB | Lower quality than Q6_K. |
| Q6_K | 1.32 GB | Recommended. Very high quality. |
| Q6_K_L | 1.39 GB | Recommended. Uses Q8_0 for output and embedding, and Q6_K for everything else. Very high quality. |
| Q6_K_XL | 1.66 GB | Recommended. Uses F16 for output and embedding, and Q6_K for everything else. Very high quality. |
| Q8_K_XS | 1.71 GB | Lower quality than Q8_0. |
| Q8_K_S | 1.71 GB | Lower quality than Q8_0. |
| Q8_0 | 1.71 GB | Recommended. Quality almost like F16. |
| Q8_K_XL | 1.98 GB | Recommended. Uses F16 for output and embedding, and Q8_0 for everything else. Quality almost like F16. |
| F16 | 3.21 GB | Not recommended. Overkill. Prefer Q8_0. |
| ORIGINAL (BF16) | 3.21 GB | Not recommended. Overkill. Prefer Q8_0. |
Quantized using TAO71-AI AutoQuantizer. You can check out the original model card here.