Update README.md
Browse files
README.md
CHANGED
|
@@ -139,7 +139,6 @@ pip install torch>=2.1.0 transformers>=4.40.0 accelerate compressed-tensors
|
|
| 139 |
| **Base Model** | [microsoft/NextCoder-14B](https://huggingface.co/microsoft/NextCoder-14B) |
|
| 140 |
| **Quantization Method** | FP8 E4M3 weight-only |
|
| 141 |
| **Framework** | llm-compressor + compressed_tensors |
|
| 142 |
-
| **Calibration Samples** | 2048 (8x industry standard) |
|
| 143 |
| **Storage Size** | ~14GB (sharded safetensors) |
|
| 144 |
| **VRAM (vLLM)** | ~14GB |
|
| 145 |
| **VRAM (Transformers)** | ~28GB+ (decompressed to BF16) |
|
|
@@ -188,12 +187,7 @@ The 14B model offers significant improvements over 7B:
|
|
| 188 |
|
| 189 |
**With vLLM**, the 14B model fits comfortably on a single RTX 4090 (24GB) or RTX 5000 Ada (32GB).
|
| 190 |
|
| 191 |
-
## 🔬 Quality Assurance
|
| 192 |
|
| 193 |
-
- **High-quality calibration:** 2048 diverse code samples (8x industry standard of 256)
|
| 194 |
-
- **Validation:** Tested on code generation benchmarks
|
| 195 |
-
- **Format:** Standard compressed_tensors for broad compatibility
|
| 196 |
-
- **Optimization:** Fine-tuned calibration for code-specific patterns
|
| 197 |
|
| 198 |
## 📚 Original Model
|
| 199 |
|
|
|
|
| 139 |
| **Base Model** | [microsoft/NextCoder-14B](https://huggingface.co/microsoft/NextCoder-14B) |
|
| 140 |
| **Quantization Method** | FP8 E4M3 weight-only |
|
| 141 |
| **Framework** | llm-compressor + compressed_tensors |
|
|
|
|
| 142 |
| **Storage Size** | ~14GB (sharded safetensors) |
|
| 143 |
| **VRAM (vLLM)** | ~14GB |
|
| 144 |
| **VRAM (Transformers)** | ~28GB+ (decompressed to BF16) |
|
|
|
|
| 187 |
|
| 188 |
**With vLLM**, the 14B model fits comfortably on a single RTX 4090 (24GB) or RTX 5000 Ada (32GB).
|
| 189 |
|
|
|
|
| 190 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
|
| 192 |
## 📚 Original Model
|
| 193 |
|