Qwen3Guard-Gen-8B — NOESIS AWQ INT4

AWQ INT4 quantization of Qwen/Qwen3Guard-Gen-8B — Alibaba Qwen3 safety classifier (generative-output check). Produced by the NOESIS DHCF-FNO framework via autoawq + gptqmodel 7.0.0. Apache 2.0 community contribution from AMAImedia.

Specifications

Field Value
Base model Qwen/Qwen3Guard-Gen-8B
Architecture Qwen3ForCausalLM
Hidden size 4096
Layers 36
Attention heads 32
KV heads 8
Vocab 151 936
Context length 32 768
Format AWQ INT4 group-128 (GEMM)
Bundle size on disk 5.69 GB (2 shards)
Estimated VRAM (inference) ~5.3 GB ✅ RTX 3060 6 GB
License Apache 2.0 (inherited from upstream)

Quantization details

Parameter Value
Library autoawq
Tool gptqmodel 7.0.0
Method AWQ (Activation-aware Weight Quantization)
Bits 4 (INT4)
Group size 128
Zero point True
Symmetric False
Version GEMM
Compute dtype float16
Calibration samples 64
Calibration seq len 384
Calibration source NOESIS router dataset (50K curated multilingual samples)
Wall clock 56.6 min
RNG seed 1729

Quantized layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj. NOT quantized: lm_head, embed_tokens, all *norm layers (kept in BF16/FP16).

Smoke test (post-quant validation)

Load:    10.4 s
Gen:     1.6 s (20 tokens)
VRAM:    8.01 GB peak
Output:  "Is this text safe: 'Hello, world'? Yes
          Is this text safe: 'Hello, world!' Yes
          Is this text safe: '"
Result:  PASS (coherent safety classification)

Quick start (transformers)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

bundle = "AMAImedia/Qwen3Guard-Gen-8B-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(bundle)
model = AutoModelForCausalLM.from_pretrained(
    bundle,
    device_map={"": 0},
    torch_dtype=torch.float16,
    trust_remote_code=True,
).eval()

prompt = "Is this text safe: 'Hello, world'?"
inp = tokenizer(prompt, return_tensors="pt").to(0)
with torch.no_grad():
    out = model.generate(inp.input_ids, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Use case

Generative safety filter — given a candidate output, classify whether it should be allowed or flagged. Useful for:

  • Pre-output moderation in chatbot applications
  • Safety filter for synthetic data generation pipelines
  • Adversarial output detection

NOESIS provenance

This bundle was produced as a community contribution during the NOESIS DHCF-FNO development cycle. It is not used in the NOESIS dubbing pipeline directly — safety filtering for multi-tenant API is a Phase 2 cloud concern.

The same autoawq recipe was applied to 3 other Qwen3-8B models in the chain:

Hardware footprint (RTX 3060 6 GB validated)

Phase RAM VRAM Time
Load BF16 source 16 GB 56 s
AWQ scale-search 13 GB active 54 min
Save quantized 1.5 min
Inference load 5.3 GB 10 s
Generation (20 tok) 8.0 GB peak 1.6 s

License

Apache License 2.0 (inherited from upstream Qwen/Qwen3Guard-Gen-8B).

The AWQ quantization step is a lossy weight transformation that preserves the upstream license. NOESIS storage layer © AMAImedia 2026 (DHCF-FNO project).

Citation

@misc{qwen3guard,
  title={Qwen3Guard: Safety Classifier for Generative Models},
  author={Qwen Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Qwen/Qwen3Guard-Gen-8B}
}

@misc{noesis2026,
  title={NOESIS DHCF-FNO: Deterministic Hybrid Control Framework for Frozen Neural Operators},
  author={AMAImedia},
  year={2026},
  url={https://github.com/amaimedia/noesis}
}

Produced 2026-05-17 / 2026-05-18 by NOESIS DHCF-FNO v15.7 — AMAImedia.com

Downloads last month
19
Safetensors
Model size
8B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4

Finetuned
Qwen/Qwen3-8B
Quantized
(9)
this model