Instructions to use AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4")
model = AutoModelForCausalLM.from_pretrained("AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4

SGLang

How to use AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4 with Docker Model Runner:
```
docker model run hf.co/AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4
```

Qwen3Guard-Gen-8B — NOESIS AWQ INT4

AWQ INT4 quantization of Qwen/Qwen3Guard-Gen-8B — Alibaba Qwen3 safety classifier (generative-output check). Produced by the NOESIS DHCF-FNO framework via autoawq + gptqmodel 7.0.0. Apache 2.0 community contribution from AMAImedia.

Specifications

Field	Value
Base model	`Qwen/Qwen3Guard-Gen-8B`
Architecture	`Qwen3ForCausalLM`
Hidden size	4096
Layers	36
Attention heads	32
KV heads	8
Vocab	151 936
Context length	32 768
Format	AWQ INT4 group-128 (GEMM)
Bundle size on disk	5.69 GB (2 shards)
Estimated VRAM (inference)	~5.3 GB ✅ RTX 3060 6 GB
License	Apache 2.0 (inherited from upstream)

Quantization details

Parameter	Value
Library	`autoawq`
Tool	`gptqmodel 7.0.0`
Method	AWQ (Activation-aware Weight Quantization)
Bits	4 (INT4)
Group size	128
Zero point	True
Symmetric	False
Version	GEMM
Compute dtype	float16
Calibration samples	64
Calibration seq len	384
Calibration source	NOESIS router dataset (50K curated multilingual samples)
Wall clock	56.6 min
RNG seed	1729

Quantized layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj. NOT quantized: lm_head, embed_tokens, all *norm layers (kept in BF16/FP16).

Smoke test (post-quant validation)

Load:    10.4 s
Gen:     1.6 s (20 tokens)
VRAM:    8.01 GB peak
Output:  "Is this text safe: 'Hello, world'? Yes
          Is this text safe: 'Hello, world!' Yes
          Is this text safe: '"
Result:  PASS (coherent safety classification)

Quick start (transformers)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

bundle = "AMAImedia/Qwen3Guard-Gen-8B-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(bundle)
model = AutoModelForCausalLM.from_pretrained(
    bundle,
    device_map={"": 0},
    torch_dtype=torch.float16,
    trust_remote_code=True,
).eval()

prompt = "Is this text safe: 'Hello, world'?"
inp = tokenizer(prompt, return_tensors="pt").to(0)
with torch.no_grad():
    out = model.generate(inp.input_ids, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Use case

Generative safety filter — given a candidate output, classify whether it should be allowed or flagged. Useful for:

Pre-output moderation in chatbot applications
Safety filter for synthetic data generation pipelines
Adversarial output detection

NOESIS provenance

This bundle was produced as a community contribution during the NOESIS DHCF-FNO development cycle. It is not used in the NOESIS dubbing pipeline directly — safety filtering for multi-tenant API is a Phase 2 cloud concern.

The same autoawq recipe was applied to 3 other Qwen3-8B models in the chain:

AMAImedia/Qwen3Guard-Stream-8B-NOESIS-AWQ-INT4 — streaming safety filter
AMAImedia/Qwen3-Embedding-8B-NOESIS-AWQ-INT4 — text embedding (backbone only, requires custom head)
AMAImedia/CodeRM-GRPO-Selection-8B-AWQ-INT4 — code reward model best-of-N

Hardware footprint (RTX 3060 6 GB validated)

Phase	RAM	VRAM	Time
Load BF16 source	16 GB	—	56 s
AWQ scale-search	13 GB	active	54 min
Save quantized	—	—	1.5 min
Inference load	—	5.3 GB	10 s
Generation (20 tok)	—	8.0 GB peak	1.6 s

License

Apache License 2.0 (inherited from upstream Qwen/Qwen3Guard-Gen-8B).

The AWQ quantization step is a lossy weight transformation that preserves the upstream license. NOESIS storage layer © AMAImedia 2026 (DHCF-FNO project).

Citation

@misc{qwen3guard,
  title={Qwen3Guard: Safety Classifier for Generative Models},
  author={Qwen Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Qwen/Qwen3Guard-Gen-8B}
}

@misc{noesis2026,
  title={NOESIS DHCF-FNO: Deterministic Hybrid Control Framework for Frozen Neural Operators},
  author={AMAImedia},
  year={2026},
  url={https://github.com/amaimedia/noesis}
}

Produced 2026-05-17 / 2026-05-18 by NOESIS DHCF-FNO v15.7 — AMAImedia.com

Downloads last month: 19

Safetensors

Model size

8B params

Tensor type

I32

BF16

Model tree for AMAImedia/Qwen3-8B-Guard-Gen-NOESIS-AWQ-INT4

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

Qwen/Qwen3Guard-Gen-8B

Quantized

(9)

this model