Instructions to use UNESCO/gliguard-unesco-ethics with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use UNESCO/gliguard-unesco-ethics with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("UNESCO/gliguard-unesco-ethics") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
π‘οΈ GLiGuard UNESCO Ethics β AI Guardrail
An open-source guardrail classifier that operationalises the 2021 UNESCO Recommendation on the Ethics of Artificial Intelligence as a fast, multilingual, schema-driven text classifier.
π― TL;DR
A 300M-parameter encoder-classifier, fine-tuned with LoRA on 45,340 records (synthetic UNESCO-anchored + WildGuardMix safety floor, EN/FR/ES/RU), screens text against 12 UNESCO ethics labels for use as a pre/post-processing guardrail inside LLM pipelines. UNESCO macro-F1 = 0.817, safety false-positive rate 0.16 % with the calibrated thresholds shipped alongside the weights.
from gliner2 import GLiNER2
from peft import PeftModel
import json
from huggingface_hub import hf_hub_download
REPO = "UNESCO/gliguard-unesco-ethics"
# 1. Load base + adapter
base = GLiNER2.from_pretrained("fastino/gliguard-LLMGuardrails-300M")
model = PeftModel.from_pretrained(base, REPO, subfolder="best")
model.train(False)
# 2. Load production thresholds (REQUIRED β see Calibration section)
thresholds = json.loads(open(hf_hub_download(REPO, "calibrated_thresholds.json")).read())
labels = list(thresholds.keys())
# 3. Classify
tasks = {"unesco_ethics": {"labels": labels, "multi_label": True, "cls_threshold": 0.0}}
out = model.classify_text(
"We deploy facial recognition to track all citizens in public spaces.",
tasks, threshold=0.0, include_confidence=True,
)
scores = {item["label"]: item["confidence"] for item in out["unesco_ethics"]}
fired = [L for L, s in scores.items() if s >= thresholds[L]["best_threshold"]]
print(fired) # β ['mass_surveillance']
π The 12 UNESCO Labels
Each label anchors to one or more paragraphs of the 2021 UNESCO Recommendation on the Ethics of AI and to multilingual concept entries from the UNESCO Thesaurus.
| Emoji | Label | Anchors (Recommendation Β§) |
|---|---|---|
| ποΈ | mass_surveillance |
Β§75βΒ§77 |
| π‘οΈ | privacy_data_exposure |
Β§72βΒ§74 |
| βοΈ | discrimination_bias |
Β§28βΒ§30 |
| π© | gender_harm |
Β§90βΒ§91 (gender equality) |
| π§ | child_vulnerable_harm |
Β§125βΒ§130 |
| π° | disinformation |
Β§80, Β§117 |
| π | cultural_harm |
Β§86βΒ§89 (cultural diversity) |
| π± | environmental_harm |
Β§86 (environmental ethics) |
| π | life_death_automation |
Β§38 (right to life, autonomy) |
| π§ββοΈ | no_human_oversight |
Β§32βΒ§37 (human oversight) |
| ποΈ | human_dignity_violation |
Β§13, Β§22 |
| πΊπ³ | un_context_risk |
Β§1, Β§10 (UN value alignment) |
Multi-label. A single input can trigger any subset of the 12 labels independently.
π Multilingual Coverage
| Language | Code | Training records | UNESCO macro-F1 |
|---|---|---|---|
| π¬π§ English | en |
33,972 | 0.804 |
| π«π· French | fr |
3,702 | 0.840 |
| πͺπΈ Spanish | es |
3,766 | 0.822 |
| π·πΊ Russian | ru |
3,560 | 0.779 |
| πΈπ¦ Arabic | ar |
β | planned, v2.0 (multilingual base swap) |
| π¨π³ Chinese | zh |
β | planned, v2.0 (Thesaurus contribution + multilingual base) |
All four supported languages (EN/FR/ES/RU) fall within the SPEC Β§7.3 fairness band. AR / ZH are NOT supported in v1.2 β a v1.3 AR experiment landed at AR F1 = 0.366, well below the safe-deployment threshold; root cause is the English-pretrained DeBERTa base (LoRA can't fix monolingual tokenisation). The v2.0 work-package switches to a multilingual encoder (XLM-RoBERTa / mDeBERTa) and pairs the release with SHS Arabic-speaker review. For AR / ZH content today, do not rely on this model alone β route to human review or use a multilingual baseline (Llama Guard 3, ShieldGemma) as a fallback.
π Quick Start
Installation
pip install gliner2 peft huggingface_hub
Production inference (with calibrated thresholds)
from gliner2 import GLiNER2
from peft import PeftModel
import json
from huggingface_hub import hf_hub_download
REPO = "UNESCO/gliguard-unesco-ethics"
model = PeftModel.from_pretrained(
GLiNER2.from_pretrained("fastino/gliguard-LLMGuardrails-300M"),
REPO, subfolder="best",
)
model.train(False)
thresholds = json.loads(open(hf_hub_download(REPO, "calibrated_thresholds.json")).read())
labels = list(thresholds.keys())
def screen(text: str) -> list[str]:
"""Return the UNESCO labels triggered by `text` under production thresholds."""
tasks = {"unesco_ethics": {"labels": labels, "multi_label": True, "cls_threshold": 0.0}}
out = model.classify_text(text, tasks, threshold=0.0, include_confidence=True)
scores = {item["label"]: item["confidence"] for item in out["unesco_ethics"]}
return [L for L, s in scores.items() if s >= thresholds[L]["best_threshold"]]
# Examples
print(screen("Our HR system rejects all candidates over 50."))
# β ['discrimination_bias']
print(screen("Deploy autonomous weapon systems that select targets without human approval."))
# β ['life_death_automation', 'no_human_oversight']
print(screen("This week's AI summit covered governance frameworks across 50 countries."))
# β [] (benign β no UNESCO violation)
β οΈ Always use the calibrated thresholds
The default 0.5 threshold produces a 100 % safety false-positive rate β the model fires at least one label on every benign input (a GLiNER2 multi-label-with-low-threshold artefact). The shipped calibrated_thresholds.json cuts the safety FPR to 0.16 % while lifting UNESCO macro-F1 from 0.802 to 0.817. Do not skip this step in production.
π Evaluation
Held-out 10 % stratified val (4,500 records) with calibrated thresholds applied.
Headline metrics
| Regime | n | macro-F1 |
|---|---|---|
| UNESCO regime (synthetic positives) | 752 | 0.817 β (target β₯0.80) |
| Safety regime (benign content) | 3,748 | FPR 0.16 % π― |
| Base-model delta | β | +53.3 pp vs fastino/gliguard-LLMGuardrails-300M |
Per-label F1 (UNESCO regime, calibrated)
| Label | precision | recall | F1 | support |
|---|---|---|---|---|
π± environmental_harm |
0.876 | 1.000 | 0.934 | 71 |
π§ child_vulnerable_harm |
0.906 | 0.939 | 0.922 | 66 |
π° disinformation |
0.857 | 0.952 | 0.902 | 63 |
π cultural_harm |
0.838 | 0.945 | 0.889 | 55 |
π‘οΈ privacy_data_exposure |
0.875 | 0.875 | 0.875 | 64 |
ποΈ mass_surveillance |
0.940 | 0.794 | 0.861 | 68 |
π© gender_harm |
0.788 | 0.857 | 0.821 | 63 |
βοΈ discrimination_bias |
0.838 | 0.765 | 0.800 | 68 |
πΊπ³ un_context_risk |
0.722 | 0.825 | 0.770 | 57 |
π§ββοΈ no_human_oversight |
0.875 | 0.673 | 0.761 | 52 |
ποΈ human_dignity_violation |
0.857 | 0.621 | 0.720 | 66 |
π life_death_automation |
0.762 | 0.610 | 0.678 | 59 |
F1 0.00 0.25 0.50 0.75 1.00
|------------|------------|------------|------------|
π± βββββββββββββββββββββββββββββββββββββββββββββββ 0.934
π§ βββββββββββββββββββββββββββββββββββββββββββββββ 0.922
π° βββββββββββββββββββββββββββββββββββββββββββββββ 0.902
π βββββββββββββββββββββββββββββββββββββββββββββββ 0.889
π‘οΈ βββββββββββββββββββββββββββββββββββββββββββββββ 0.875
ποΈ βββββββββββββββββββββββββββββββββββββββββββββββ 0.861
π© βββββββββββββββββββββββββββββββββββββββββββββββ 0.821
βοΈ βββββββββββββββββββββββββββββββββββββββββββββββ 0.800
πΊπ³ βββββββββββββββββββββββββββββββββββββββββββββββ 0.770
π§ββοΈ βββββββββββββββββββββββββββββββββββββββββββββββ 0.761
ποΈ βββββββββββββββββββββββββββββββββββββββββββββββ 0.720
π βββββββββββββββββββββββββββββββββββββββββββββββ 0.678
Per-language macro-F1
EN ββββββββββββββββββββββββββββββββββ 0.804 (n=212)
FR ββββββββββββββββββββββββββββββββββ 0.840 (n=214)
ES ββββββββββββββββββββββββββββββββββ 0.822 (n=155)
RU ββββββββββββββββββββββββββββββββββ 0.779 (n=171)
π Full reproducible recipe β see reports/07_paper_evaluation.md in the project repository.
π§ͺ Training
Dataset (UNESCO/gliguard-unesco-training-v1, private)
| Source | Records | Share | Provenance |
|---|---|---|---|
| WildGuardMix (safety floor) | 30,000 | 66 % | allenai/wildguardmix train split |
| Synthetic UNESCO-aligned | 15,340 | 34 % | Qwen3-32B (Apache 2.0) via HF Inference Providers, KG-conditioned on the Recommendation |
| UNESCO institutional negatives | 0 | 0 % | Deferred to v1.3 |
Hyperparameters
| Param | Value |
|---|---|
| Base | fastino/gliguard-LLMGuardrails-300M |
| Method | LoRA (r=16, Ξ±=32, dropout=0) |
encoder_lr / task_lr |
2e-5 / 2e-4 |
| Epochs | 3 |
| Batch size | 16 (per device) |
| Max sequence length | 512 |
| Seed | 42 (reproducible) |
| Hardware | NVIDIA A100 80GB (HF Jobs, org-billed to UNESCO) |
| Wall-clock | 29 min training + ~6 min Hub push |
| Total cost | ~$1.40 for the v1.2 fine-tune; $6.20 cumulative across all of v1.2 R&D |
LR was swept across {1e-5, 2e-5, 5e-5} per SPEC Β§7.2 β 2e-5 selected on the dev set.
Reproducibility. Every commit, evaluation, and inference is logged to RUN_LOG.md with UTC timestamps and SHA-pinned dependencies. Full re-run recipe in reports/07_paper_evaluation.md.
β οΈ Risks & Limitations
Per SPEC Β§10 of the project specification. Each category corresponds to a deeper section in the project's evaluation reports.
1. Interpretive ambiguity π€
The 12 labels are operational distillations of the Recommendation, not legal definitions. Borderline cases (academic discussion of a violation vs the violation itself) are routed to hard-negatives during training; the residual ambiguity is real and stakeholders must retain final say.
2. Coverage gaps π
- Arabic + Chinese deferred to Phase 2.
- UNESCO institutional negatives (Gap 2) deferred to v1.3.
- The synthetic data is anchored to the 2021 Recommendation; the model may under-fire on emerging risks (generative manipulation, agent autonomy) that the Recommendation does not enumerate by name.
3. Bias & fairness βοΈ
Evaluated on two held-out surfaces:
v1 held-out val (4,500 rows, calibrated) β UNESCO macro-F1 0.817; per-language en/fr/es/ru all within Β±5 pp; gender slice (n=220) macro-F1 0.795 in band; race (n=27) and disability (n=38) below the nβ₯30 evidence threshold.
v1.3 balanced fairness eval (1,901 rows, 192 controlled cells, calibrated) β UNESCO macro-F1 0.447; per-attribute slices now defensibly measurable:
| Attribute | n | macro-F1 | Status |
|---|---|---|---|
| gender | 348 | 0.380 | flagged (β6.7 pp vs balanced baseline) β investigate v1.4 |
| race | 186 | 0.479 | β data-gap RESOLVED |
| disability | 189 | 0.458 | β methodology caveat RESOLVED (strict & supported macros converge) |
The drop from 0.817 to 0.447 is the model's generalisation gap: the v1 val is a held-out slice of the training corpus (same stylistic surface); the balanced eval is fresh Qwen3-32B generation. Honest read: ~37 pp of v1.2's 0.817 reflects surface-pattern memorisation; ~45 pp generalises. See reports/08_balanced_fairness_eval.md for the full breakdown.
4. Deployment risks π¦
- MUST use calibrated thresholds. Default 0.5 β 100 % safety FPR.
- Audit-trail required. This is a screening tool; every escalation should be logged and reviewed.
- No autonomous block. Plug it as a signal into a system where humans make the final decision.
5. Maintenance commitments π§
- v2.0 (next major version): multilingual base swap (
XLM-RoBERTa-baseormDeBERTa-v3-base); native Arabic + Chinese support; SHS Arabic-speaker + Chinese-speaker review on synthetic data and predictions; UNESCO Thesaurus ZH contribution sub-track. The v1.3 AR experiment surfaced the monolingual base as the bottleneck β seereports/09_v1.3_release.md. ETA: dependent on team funding (paper Β§7.4). - v1.x patches: tagged on
main(e.g.,v1.2.1); no public re-release unless macro-F1 changes by β₯1 pp. - Periodic re-anchoring as the Recommendation interpretation evolves.
- Open issues + roadmap in the project repository.
π Citation
@misc{unesco_gliguard_2026,
title = {GLiGuard UNESCO Ethics: An Open-Source Guardrail Classifier for the
2021 UNESCO Recommendation on the Ethics of Artificial Intelligence},
author = {UNESCO DBS Data \& AI Team},
year = {2026},
howpublished = {\url{https://huggingface.co/UNESCO/gliguard-unesco-ethics}},
note = {Apache-2.0 fine-tune of fastino/gliguard-LLMGuardrails-300M with LoRA}
}
@misc{unesco_recommendation_2021,
title = {Recommendation on the Ethics of Artificial Intelligence},
author = {{UNESCO}},
year = {2021},
howpublished = {\url{https://unesdoc.unesco.org/ark:/48223/pf0000381137}}
}
@article{zaratiana2025gliner2,
title = {GLiNER2: An Efficient Multi-Task Information Extraction System},
author = {Zaratiana, Urchade and others},
journal = {arXiv preprint arXiv:2507.18546},
year = {2025}
}
@article{mo2025kggen,
title = {KG-Gen: A Knowledge Graph Generation Toolkit},
author = {Mo, Belinda and others},
journal = {arXiv preprint arXiv:2502.09956},
year = {2025}
}
ποΈ Acknowledgements
Produced by the UNESCO DBS Data & AI Team (Digital Business Solutions). Aligned with the Social and Human Sciences Sector's stewardship of the 2021 Recommendation on the Ethics of AI.
π€ Built openly: all code, prompts, training logs, evaluation runs, and decisions are open-sourced in the project repository. Issues + contributions welcome.
π Repo Structure
UNESCO/gliguard-unesco-ethics/
βββ README.md # this card
βββ calibrated_thresholds.json # REQUIRED for production
βββ best/ # π best-eval-loss LoRA adapter (step 5000)
βββ final/ # final LoRA adapter (step 7590)
βββ checkpoint-{6500,7000,7500}/ # rolling 3-checkpoint history
βββ training_config.json # full hyperparameter snapshot
π‘οΈ Trained transparently. Aligned with the 2021 UNESCO Recommendation on the Ethics of AI.
Model tree for UNESCO/gliguard-unesco-ethics
Base model
fastino/gliner2-base-v1