Model Card for Model ID

This is a Natural Language Inference (NLI) model built by fine-tuning DistilBERT-base-uncased on the GPT-3 NLI dataset. The model performs textual entailment classification - given two pieces of text (a premise and a hypothesis), it determines the logical relationship between them.

Model Details

Model Description

What it does:

Takes two text inputs: a premise (text_a) and a hypothesis (text_b)

Classifies their relationship into one of three categories:

Entailment: The hypothesis logically follows from the premise

Neutral: The hypothesis is neither supported nor contradicted by the premise

Contradiction: The hypothesis contradicts the premise

Use Cases:

Reading comprehension tasks
Logical reasoning applications
Question-answering systems
Text coherence analysis
Information verification tasks

Architecture: DistilBERT-based sequence classification model with 3 output classes, optimized for efficiency while maintaining strong performance on natural language understanding tasks.

This type of model is fundamental for applications requiring understanding of logical relationships between text passages, such as fact-checking, automated reasoning, and reading comprehension systems.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "gulupgulup/distilbert_nli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "gulupgulup/distilbert_nli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example premise and hypothesis
premise = "A person is riding a bicycle in the park."
hypothesis = "Someone is exercising outdoors."

# Tokenize the input
inputs = tokenizer(premise, hypothesis, return_tensors="pt", truncation=True, padding=True)

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

# Get the predicted label
id2label = {0: "entailment", 1: "neutral", 2: "contradiction"}
predicted_label = id2label[predicted_class.item()]

print(f"Premise: {premise}")
print(f"Hypothesis: {hypothesis}")
print(f"Predicted relationship: {predicted_label}")
print(f"Confidence scores: {predictions.squeeze().tolist()}")

Training Details

Training Data

Dataset: \href{https://huggingface.co/datasets/pietrolesci/gpt3_nli}{pietrolesci/gpt3_nli} - A natural language inference dataset containing premise-hypothesis pairs with three-class labels (entailment, neutral, contradiction). The dataset consists of text pairs (text_a and text_b) where the model learns to determine the logical relationship between the premise and hypothesis.ed]

Training Procedure

Base Model: DistilBERT-base-uncased fine-tuned for sequence classification with 3 output labels for natural language inference.

Training Framework: Hugging Face Transformers Trainer with Weights & Biases (wandb) integration for experiment tracking.

Data Split: The original training set was split into train (81%), validation (9%), and test (10%) sets using stratified sampling to maintain label distribution balance across splits.

Preprocessing [optional]

Text pairs are tokenized using DistilBERT's tokenizer with truncation and padding applied. The label column is cast to ClassLabel format with three categories: entailment, neutral, and contradiction.

Data Handling: Uses DataCollatorWithPadding for dynamic padding during training and tokenizes premise-hypothesis pairs jointly.

Training Hyperparameters

Learning Rate: 1e-5

Batch Size: 64 (both training and evaluation)

Number of Epochs: 5

Weight Decay: 0.01

Max Gradient Norm: 1.0

Optimizer: AdamW (default)

Evaluation Strategy: Every epoch

Save Strategy: Every epoch

Logging Steps: 100

Best Model Selection: Based on validation accuracy (higher is better)

Evaluation

Metrics

Accuracy: Primary evaluation metric measuring the percentage of correctly classified premise-hypothesis pairs across all three NLI categories.

Precision (Macro-averaged): Secondary metric calculating the average precision across all three classes (entailment, neutral, contradiction), giving equal weight to each class regardless of support. This metric is useful for understanding model performance on each NLI relationship type, especially important when dealing with potentially imbalanced class distributions.

Both metrics are computed using the evaluate library and rounded to 3 decimal places for reporting.

Downloads last month: 4

Safetensors

Model size

67M params

Tensor type

F32

Model tree for gulupgulup/distilbert_nli

Base model

distilbert/distilbert-base-uncased

Finetuned

(10339)

this model

gulupgulup
/

distilbert_nli