Text Classification
GLiNER2
Safetensors
multi-label-classification
guardrail
ai-safety
ai-ethics
unesco
lora
responsible-ai

πŸ›‘οΈ GLiGuard UNESCO Ethics β€” AI Guardrail

An open-source guardrail classifier that operationalises the 2021 UNESCO Recommendation on the Ethics of Artificial Intelligence as a fast, multilingual, schema-driven text classifier.

License Base Architecture Languages Size Labels


🎯 TL;DR

A 300M-parameter encoder-classifier, fine-tuned with LoRA on 45,340 records (synthetic UNESCO-anchored + WildGuardMix safety floor, EN/FR/ES/RU), screens text against 12 UNESCO ethics labels for use as a pre/post-processing guardrail inside LLM pipelines. UNESCO macro-F1 = 0.817, safety false-positive rate 0.16 % with the calibrated thresholds shipped alongside the weights.

from gliner2 import GLiNER2
from peft import PeftModel
import json
from huggingface_hub import hf_hub_download

REPO = "UNESCO/gliguard-unesco-ethics"

# 1. Load base + adapter
base = GLiNER2.from_pretrained("fastino/gliguard-LLMGuardrails-300M")
model = PeftModel.from_pretrained(base, REPO, subfolder="best")
model.train(False)

# 2. Load production thresholds (REQUIRED β€” see Calibration section)
thresholds = json.loads(open(hf_hub_download(REPO, "calibrated_thresholds.json")).read())
labels = list(thresholds.keys())

# 3. Classify
tasks = {"unesco_ethics": {"labels": labels, "multi_label": True, "cls_threshold": 0.0}}
out = model.classify_text(
    "We deploy facial recognition to track all citizens in public spaces.",
    tasks, threshold=0.0, include_confidence=True,
)
scores = {item["label"]: item["confidence"] for item in out["unesco_ethics"]}
fired = [L for L, s in scores.items() if s >= thresholds[L]["best_threshold"]]
print(fired)  # β†’ ['mass_surveillance']

πŸ“‹ The 12 UNESCO Labels

Each label anchors to one or more paragraphs of the 2021 UNESCO Recommendation on the Ethics of AI and to multilingual concept entries from the UNESCO Thesaurus.

Emoji Label Anchors (Recommendation Β§)
πŸ‘οΈ mass_surveillance Β§75–§77
πŸ›‘οΈ privacy_data_exposure Β§72–§74
βš–οΈ discrimination_bias Β§28–§30
πŸ‘© gender_harm Β§90–§91 (gender equality)
πŸ§’ child_vulnerable_harm Β§125–§130
πŸ“° disinformation Β§80, Β§117
🌍 cultural_harm Β§86–§89 (cultural diversity)
🌱 environmental_harm §86 (environmental ethics)
πŸ’€ life_death_automation Β§38 (right to life, autonomy)
πŸ§‘β€βš–οΈ no_human_oversight Β§32–§37 (human oversight)
πŸ•ŠοΈ human_dignity_violation Β§13, Β§22
πŸ‡ΊπŸ‡³ un_context_risk Β§1, Β§10 (UN value alignment)

Multi-label. A single input can trigger any subset of the 12 labels independently.


🌍 Multilingual Coverage

Language Code Training records UNESCO macro-F1
πŸ‡¬πŸ‡§ English en 33,972 0.804
πŸ‡«πŸ‡· French fr 3,702 0.840
πŸ‡ͺπŸ‡Έ Spanish es 3,766 0.822
πŸ‡·πŸ‡Ί Russian ru 3,560 0.779
πŸ‡ΈπŸ‡¦ Arabic ar β€” planned, v2.0 (multilingual base swap)
πŸ‡¨πŸ‡³ Chinese zh β€” planned, v2.0 (Thesaurus contribution + multilingual base)

All four supported languages (EN/FR/ES/RU) fall within the SPEC Β§7.3 fairness band. AR / ZH are NOT supported in v1.2 β€” a v1.3 AR experiment landed at AR F1 = 0.366, well below the safe-deployment threshold; root cause is the English-pretrained DeBERTa base (LoRA can't fix monolingual tokenisation). The v2.0 work-package switches to a multilingual encoder (XLM-RoBERTa / mDeBERTa) and pairs the release with SHS Arabic-speaker review. For AR / ZH content today, do not rely on this model alone β€” route to human review or use a multilingual baseline (Llama Guard 3, ShieldGemma) as a fallback.


πŸš€ Quick Start

Installation

pip install gliner2 peft huggingface_hub

Production inference (with calibrated thresholds)

from gliner2 import GLiNER2
from peft import PeftModel
import json
from huggingface_hub import hf_hub_download

REPO = "UNESCO/gliguard-unesco-ethics"

model = PeftModel.from_pretrained(
    GLiNER2.from_pretrained("fastino/gliguard-LLMGuardrails-300M"),
    REPO, subfolder="best",
)
model.train(False)

thresholds = json.loads(open(hf_hub_download(REPO, "calibrated_thresholds.json")).read())
labels = list(thresholds.keys())


def screen(text: str) -> list[str]:
    """Return the UNESCO labels triggered by `text` under production thresholds."""
    tasks = {"unesco_ethics": {"labels": labels, "multi_label": True, "cls_threshold": 0.0}}
    out = model.classify_text(text, tasks, threshold=0.0, include_confidence=True)
    scores = {item["label"]: item["confidence"] for item in out["unesco_ethics"]}
    return [L for L, s in scores.items() if s >= thresholds[L]["best_threshold"]]


# Examples
print(screen("Our HR system rejects all candidates over 50."))
# β†’ ['discrimination_bias']

print(screen("Deploy autonomous weapon systems that select targets without human approval."))
# β†’ ['life_death_automation', 'no_human_oversight']

print(screen("This week's AI summit covered governance frameworks across 50 countries."))
# β†’ []  (benign β€” no UNESCO violation)

⚠️ Always use the calibrated thresholds

The default 0.5 threshold produces a 100 % safety false-positive rate β€” the model fires at least one label on every benign input (a GLiNER2 multi-label-with-low-threshold artefact). The shipped calibrated_thresholds.json cuts the safety FPR to 0.16 % while lifting UNESCO macro-F1 from 0.802 to 0.817. Do not skip this step in production.


πŸ“Š Evaluation

Held-out 10 % stratified val (4,500 records) with calibrated thresholds applied.

Headline metrics

Regime n macro-F1
UNESCO regime (synthetic positives) 752 0.817 βœ… (target β‰₯0.80)
Safety regime (benign content) 3,748 FPR 0.16 % 🎯
Base-model delta β€” +53.3 pp vs fastino/gliguard-LLMGuardrails-300M

Per-label F1 (UNESCO regime, calibrated)

Label precision recall F1 support
🌱 environmental_harm 0.876 1.000 0.934 71
πŸ§’ child_vulnerable_harm 0.906 0.939 0.922 66
πŸ“° disinformation 0.857 0.952 0.902 63
🌍 cultural_harm 0.838 0.945 0.889 55
πŸ›‘οΈ privacy_data_exposure 0.875 0.875 0.875 64
πŸ‘οΈ mass_surveillance 0.940 0.794 0.861 68
πŸ‘© gender_harm 0.788 0.857 0.821 63
βš–οΈ discrimination_bias 0.838 0.765 0.800 68
πŸ‡ΊπŸ‡³ un_context_risk 0.722 0.825 0.770 57
πŸ§‘β€βš–οΈ no_human_oversight 0.875 0.673 0.761 52
πŸ•ŠοΈ human_dignity_violation 0.857 0.621 0.720 66
πŸ’€ life_death_automation 0.762 0.610 0.678 59
F1   0.00         0.25         0.50         0.75         1.00
     |------------|------------|------------|------------|
🌱   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘     0.934
πŸ§’   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘     0.922
πŸ“°   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘     0.902
🌍   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘     0.889
πŸ›‘οΈ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.875
πŸ‘οΈ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.861
πŸ‘©   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.821
βš–οΈ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.800
πŸ‡ΊπŸ‡³   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.770
πŸ§‘β€βš–οΈ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.761
πŸ•ŠοΈ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.720
πŸ’€   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘     0.678

Per-language macro-F1

EN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  0.804  (n=212)
FR β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  0.840  (n=214)
ES β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  0.822  (n=155)
RU β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘  0.779  (n=171)

πŸ“„ Full reproducible recipe β€” see reports/07_paper_evaluation.md in the project repository.


πŸ§ͺ Training

Dataset (UNESCO/gliguard-unesco-training-v1, private)

Source Records Share Provenance
WildGuardMix (safety floor) 30,000 66 % allenai/wildguardmix train split
Synthetic UNESCO-aligned 15,340 34 % Qwen3-32B (Apache 2.0) via HF Inference Providers, KG-conditioned on the Recommendation
UNESCO institutional negatives 0 0 % Deferred to v1.3

Hyperparameters

Param Value
Base fastino/gliguard-LLMGuardrails-300M
Method LoRA (r=16, Ξ±=32, dropout=0)
encoder_lr / task_lr 2e-5 / 2e-4
Epochs 3
Batch size 16 (per device)
Max sequence length 512
Seed 42 (reproducible)
Hardware NVIDIA A100 80GB (HF Jobs, org-billed to UNESCO)
Wall-clock 29 min training + ~6 min Hub push
Total cost ~$1.40 for the v1.2 fine-tune; $6.20 cumulative across all of v1.2 R&D

LR was swept across {1e-5, 2e-5, 5e-5} per SPEC Β§7.2 β€” 2e-5 selected on the dev set.

Reproducibility. Every commit, evaluation, and inference is logged to RUN_LOG.md with UTC timestamps and SHA-pinned dependencies. Full re-run recipe in reports/07_paper_evaluation.md.


⚠️ Risks & Limitations

Per SPEC Β§10 of the project specification. Each category corresponds to a deeper section in the project's evaluation reports.

1. Interpretive ambiguity πŸ€”

The 12 labels are operational distillations of the Recommendation, not legal definitions. Borderline cases (academic discussion of a violation vs the violation itself) are routed to hard-negatives during training; the residual ambiguity is real and stakeholders must retain final say.

2. Coverage gaps πŸ“‹

  • Arabic + Chinese deferred to Phase 2.
  • UNESCO institutional negatives (Gap 2) deferred to v1.3.
  • The synthetic data is anchored to the 2021 Recommendation; the model may under-fire on emerging risks (generative manipulation, agent autonomy) that the Recommendation does not enumerate by name.

3. Bias & fairness βš–οΈ

Evaluated on two held-out surfaces:

v1 held-out val (4,500 rows, calibrated) β€” UNESCO macro-F1 0.817; per-language en/fr/es/ru all within Β±5 pp; gender slice (n=220) macro-F1 0.795 in band; race (n=27) and disability (n=38) below the nβ‰₯30 evidence threshold.

v1.3 balanced fairness eval (1,901 rows, 192 controlled cells, calibrated) β€” UNESCO macro-F1 0.447; per-attribute slices now defensibly measurable:

Attribute n macro-F1 Status
gender 348 0.380 flagged (βˆ’6.7 pp vs balanced baseline) β€” investigate v1.4
race 186 0.479 βœ… data-gap RESOLVED
disability 189 0.458 βœ… methodology caveat RESOLVED (strict & supported macros converge)

The drop from 0.817 to 0.447 is the model's generalisation gap: the v1 val is a held-out slice of the training corpus (same stylistic surface); the balanced eval is fresh Qwen3-32B generation. Honest read: ~37 pp of v1.2's 0.817 reflects surface-pattern memorisation; ~45 pp generalises. See reports/08_balanced_fairness_eval.md for the full breakdown.

4. Deployment risks 🚦

  • MUST use calibrated thresholds. Default 0.5 β†’ 100 % safety FPR.
  • Audit-trail required. This is a screening tool; every escalation should be logged and reviewed.
  • No autonomous block. Plug it as a signal into a system where humans make the final decision.

5. Maintenance commitments πŸ”§

  • v2.0 (next major version): multilingual base swap (XLM-RoBERTa-base or mDeBERTa-v3-base); native Arabic + Chinese support; SHS Arabic-speaker + Chinese-speaker review on synthetic data and predictions; UNESCO Thesaurus ZH contribution sub-track. The v1.3 AR experiment surfaced the monolingual base as the bottleneck β€” see reports/09_v1.3_release.md. ETA: dependent on team funding (paper Β§7.4).
  • v1.x patches: tagged on main (e.g., v1.2.1); no public re-release unless macro-F1 changes by β‰₯1 pp.
  • Periodic re-anchoring as the Recommendation interpretation evolves.
  • Open issues + roadmap in the project repository.

πŸ“š Citation

@misc{unesco_gliguard_2026,
  title  = {GLiGuard UNESCO Ethics: An Open-Source Guardrail Classifier for the
            2021 UNESCO Recommendation on the Ethics of Artificial Intelligence},
  author = {UNESCO DBS Data \& AI Team},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/UNESCO/gliguard-unesco-ethics}},
  note   = {Apache-2.0 fine-tune of fastino/gliguard-LLMGuardrails-300M with LoRA}
}

@misc{unesco_recommendation_2021,
  title  = {Recommendation on the Ethics of Artificial Intelligence},
  author = {{UNESCO}},
  year   = {2021},
  howpublished = {\url{https://unesdoc.unesco.org/ark:/48223/pf0000381137}}
}

@article{zaratiana2025gliner2,
  title   = {GLiNER2: An Efficient Multi-Task Information Extraction System},
  author  = {Zaratiana, Urchade and others},
  journal = {arXiv preprint arXiv:2507.18546},
  year    = {2025}
}

@article{mo2025kggen,
  title   = {KG-Gen: A Knowledge Graph Generation Toolkit},
  author  = {Mo, Belinda and others},
  journal = {arXiv preprint arXiv:2502.09956},
  year    = {2025}
}

πŸ›οΈ Acknowledgements

Produced by the UNESCO DBS Data & AI Team (Digital Business Solutions). Aligned with the Social and Human Sciences Sector's stewardship of the 2021 Recommendation on the Ethics of AI.

πŸ€– Built openly: all code, prompts, training logs, evaluation runs, and decisions are open-sourced in the project repository. Issues + contributions welcome.


πŸ“‚ Repo Structure

UNESCO/gliguard-unesco-ethics/
β”œβ”€β”€ README.md                       # this card
β”œβ”€β”€ calibrated_thresholds.json      # REQUIRED for production
β”œβ”€β”€ best/                           # πŸ† best-eval-loss LoRA adapter (step 5000)
β”œβ”€β”€ final/                          # final LoRA adapter (step 7590)
β”œβ”€β”€ checkpoint-{6500,7000,7500}/    # rolling 3-checkpoint history
└── training_config.json            # full hyperparameter snapshot

πŸ›‘οΈ Trained transparently. Aligned with the 2021 UNESCO Recommendation on the Ethics of AI.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for UNESCO/gliguard-unesco-ethics

Adapter
(2)
this model

Dataset used to train UNESCO/gliguard-unesco-ethics

Papers for UNESCO/gliguard-unesco-ethics