Instructions to use SebasLopez-ai/distilbert-amazon-reviews-sentiment with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SebasLopez-ai/distilbert-amazon-reviews-sentiment with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="SebasLopez-ai/distilbert-amazon-reviews-sentiment")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("SebasLopez-ai/distilbert-amazon-reviews-sentiment") model = AutoModelForSequenceClassification.from_pretrained("SebasLopez-ai/distilbert-amazon-reviews-sentiment") - Notebooks
- Google Colab
- Kaggle
DistilBERT — Sentiment Classification
Fine-tuned DistilBERT for Amazon product review sentiment analysis. Classifies reviews into Negative (1–2★), Neutral (3★), and Positive (4–5★).
Architecture: distilbert-base-uncased fine-tuned with a classification head (3 classes)
Parameters: 66M (6 transformer layers, 768 hidden dim, 12 attention heads)
Tokenizer: BERT WordPiece uncased (vocab_size=30,522)
Dataset: Amazon Reviews 2023 — 149,761 train / 32,092 test (stratified 70/15/15 split)
Weights: model.safetensors (SafeTensors format, 255 MB)
How this model was trained
This is a supervised fine-tuning of DistilBERT for 3-class sentiment classification:
149,761 reviews (train)
│
▼
DistilBERT Tokenizer
(max_length=256, truncation)
│
▼
distilbert-base-uncased
(66M params, pretrained on BooksCorpus + English Wikipedia)
│
▼
Sequence Classification Head
(dropout=0.3, 3 output logits → Negative / Neutral / Positive)
│
▼
Training: 5 epochs
lr=2e-5, batch_size=32
weight_decay=0.01, warmup_steps=2,340
optimizer: AdamW
│
┌──────────────┴──────────────┐
▼ ▼
┌───────────────┐ ┌─────────────────────┐
│ model.safetensors│ │ Evaluation on │
│ (255 MB) │ │ 32,092 test reviews │
│ │ │ → metrics_distilbert│
│ SERVED WEIGHTS │ │ .json │
└───────────────┘ │ → predictions_distil │
│ bert.csv │
│ PRODUCTION ARTIFACTS │
└─────────────────────┘
Training stability
The loss converged smoothly over 5 epochs. Training started at loss 1.097 (step 50) and steadily decreased to ~0.50 by epoch 5. Evaluation metrics improved consistently without overfitting:
| Epoch | Eval Loss | Accuracy | F1 |
|---|---|---|---|
| 1 | 0.601 | 75.04% | 74.66% |
| 2 | 0.551 | 76.74% | 76.68% |
| 3 | 0.546 | 77.10% | 77.05% |
| 4 | 0.551 | 77.32% | 77.14% |
| 5 | 0.550 | 77.51% | 77.48% |
The eval loss plateauing after epoch 3 with F1 still inching up suggests the model reached a stable optimum at epoch 5.
Full training history with per-step loss tracking: metrics_distilbert.json.
How to use model.safetensors
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
import torch
# Load from local files
model_path = "data/models/distilbert"
model = DistilBertForSequenceClassification.from_pretrained(model_path)
tokenizer = DistilBertTokenizer.from_pretrained(model_path)
model.eval()
To classify a new review:
def classify(text: str) -> dict:
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256, padding=True)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
return {
"label": labels[pred],
"confidence": round(probs[pred].item(), 4),
"probabilities": {
"Negative": round(probs[0].item(), 4),
"Neutral": round(probs[1].item(), 4),
"Positive": round(probs[2].item(), 4),
}
}
The model expects raw review text. The tokenizer handles lowercasing (uncased), padding, and truncation to 256 tokens automatically. No HTML cleaning or special preprocessing is needed — DistilBERT was trained on raw text and handles artifacts gracefully.
What the predictions mean
The model assigns each review to one of three sentiment classes. The test set is perfectly balanced by design (stratified split):
| Class | Test samples | Predicted | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Negative (1–2★) | 10,697 | 11,073 | 75.42% | 78.07% | 76.72% |
| Neutral (3★) | 10,697 | 10,261 | 68.52% | 65.73% | 67.10% |
| Positive (4–5★) | 10,698 | 10,758 | 87.49% | 87.98% | 87.73% |
Confusion matrix
Where the model gets it right — and where it doesn't:
| Predicted Negative | Predicted Neutral | Predicted Positive | |
|---|---|---|---|
| True Negative | 8,351 (78.1%) | 2,140 (20.0%) | 206 (1.9%) |
| True Neutral | 2,526 (23.6%) | 7,031 (65.7%) | 1,140 (10.7%) |
| True Positive | 196 (1.8%) | 1,090 (10.2%) | 9,412 (88.0%) |
Key observations
- Positive → Positive is the safest path (88.0% correct). When the model says "Positive", you can trust it: precision is 87.5%. Only 1.8% of truly positive reviews are mistaken for Negative.
- Neutral is the leaky class (65.7% correct). True Neutral reviews spill 23.6% into Negative but only 10.7% into Positive. The model has a negative bias on uncertainty — when it can't tell if a review is Neutral, it defaults to Negative rather than Positive. This makes sense: lukewarm reviews ("it's okay but the handle broke") share more vocabulary with complaints than with praise.
- Cross-polarity errors are rare: only 1.9% of Negative reviews are called Positive, and only 1.8% of Positive reviews are called Negative. The model rarely makes the catastrophic mistake of flipping sentiment polarity.
- Negative detection is solid (78.1% correct). The 20.0% that spill into Neutral are typically mildly negative reviews ("not great", "could be better", "disappointed but usable") that the model hedges on.
- Overall accuracy: 77.26% — competitive for a 3-class sentiment task on e-commerce reviews. The class-balanced test set means accuracy is not inflated by majority-class bias.
Model strengths & limitations
| Strength | Limitation |
|---|---|
| Excellent at polarity detection — rarely confuses Negative with Positive (only ~2% cross-polarity errors) | Struggles with Neutral reviews — 34% of true Neutrals are misclassified, mostly as Negative |
| Positive reviews: 88% correct, 87.5% precision. Can reliably filter 4–5★ reviews for recommendation systems | Negative bias on uncertainty — the model over-predicts Negative by 376 reviews. A Neutral review is more likely to be called Negative than Positive |
| Fast inference — 66M params, ~20ms per review on CPU. No GPU required for production use at moderate scale | Sarcasm and irony are not handled — the model reads words at face value ("Great, another broken product" → Positive) |
| No preprocessing needed — raw review text goes straight into the tokenizer. Handles HTML artifacts, emojis, and typos gracefully | Non-English reviews — the tokenizer is English-only (uncased). Reviews in other languages will produce garbage predictions |
| Confidence scores — outputs calibrated probabilities, not just labels. You can set custom confidence thresholds | Domain-specific — trained on Amazon product reviews. Performance on restaurant reviews, movie reviews, or tweets will be lower |
Per-review predictions with confidence scores: predictions_distilbert.csv.
Metrics
From metrics_distilbert.json:
| Metric | Value |
|---|---|
| Accuracy | 77.26% |
| Weighted Precision | 77.14% |
| Weighted Recall | 77.26% |
| Weighted F1 | 77.18% |
| Negative F1 | 76.72% |
| Neutral F1 | 67.10% |
| Positive F1 | 87.73% |
F1 interpretation: The weighted F1 of 77.18% is competitive for a 3-class sentiment task on e-commerce reviews. The 20-point gap between Positive (87.7%) and Neutral (67.1%) reflects a well-known challenge in sentiment analysis: extreme sentiments (very positive, very negative) are linguistically easier to separate than moderate/mixed ones. The Neutral class often absorbs sarcasm, factual-but-unemotional reviews, and genuinely mixed opinions — all hard to classify.
Training hyperparameters: learning_rate=2e-5, batch_size=32, epochs=5, max_length=256, weight_decay=0.01, warmup_steps=2,340, train_samples=149,761.
Usage Examples
Full pipeline: raw text → DistilBERT tokenizer → model inference → sentiment label + confidence.
Example 1 — Enthusiastic positive review → Positive (high confidence)
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
import torch
model_path = "data/models/distilbert"
model = DistilBertForSequenceClassification.from_pretrained(model_path)
tokenizer = DistilBertTokenizer.from_pretrained(model_path)
review = "This blender is absolutely amazing! Smoothies every morning and it's so quiet. Best purchase ever."
inputs = tokenizer(review, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
print(f"Label: {labels[pred]} (confidence: {probs[pred]:.2%})")
Expected output: Label: Positive (confidence: ~98%).
Keywords like "amazing", "best", "love" and exclamation marks strongly trigger the Positive class.
Example 2 — Genuinely mixed 3-star review → Neutral
review = "The fabric is nice and the color is beautiful, but the sizing runs small and the zipper feels cheap."
inputs = tokenizer(review, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
print(f"Label: {labels[pred]} (confidence: {probs[pred]:.2%})")
Expected output: Label: Neutral (confidence: ~65%).
The "but" structure and mixture of praise ("nice", "beautiful") and criticism ("cheap", "runs small") are classic Neutral signals. Confidence is moderate because the model finds linguistic overlap with both Negative and Positive.
Example 3 — Frustrated negative review → Negative (high confidence)
review = "Stopped working after 3 days. Complete waste of money. Don't buy this garbage."
inputs = tokenizer(review, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
print(f"Label: {labels[pred]} (confidence: {probs[pred]:.2%})")
Expected output: Label: Negative (confidence: ~95%).
"Stopped working", "waste of money", "don't buy", "garbage" — classic complaint vocabulary. DistilBERT catches these patterns reliably.
Batch inference (N reviews at once)
reviews = [
"Great product, highly recommend!",
"It's okay. Nothing special but does the job.",
"Terrible quality. Fell apart in a week."
]
inputs = tokenizer(reviews, return_tensors="pt", truncation=True, max_length=256, padding=True)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1)
preds = torch.argmax(probs, dim=-1)
labels = ["Negative", "Neutral", "Positive"]
for i, review in enumerate(reviews):
print(f"Review: {review[:60]}... → {labels[preds[i]]} ({probs[i][preds[i]]:.2%})")
Serving & Integration
All examples assume the model files live at data/models/distilbert/.
1. Python CLI script (zero dependencies beyond transformers)
Save as classify_sentiment.py and run: python classify_sentiment.py "Your review text here"
"""classify_sentiment.py — classify a review from the command line."""
import sys, json
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
MODEL_DIR = "data/models/distilbert"
LABELS = {0: "Negative", 1: "Neutral", 2: "Positive"}
# Load once at module level
model = DistilBertForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_DIR)
model.eval()
def classify(text: str) -> dict:
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
return {
"text": text,
"label": LABELS[pred],
"confidence": round(probs[pred].item(), 4),
"probabilities": {LABELS[i]: round(probs[i].item(), 4) for i in range(3)},
}
if __name__ == "__main__":
text = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else input("Review: ")
result = classify(text)
print(json.dumps(result, indent=2))
Sample output:
{
"text": "This keyboard is fantastic, the mechanical switches feel incredible.",
"label": "Positive",
"confidence": 0.9847,
"probabilities": {
"Negative": 0.0021,
"Neutral": 0.0132,
"Positive": 0.9847
}
}
2. FastAPI microservice (REST JSON endpoint)
"""api.py — lightweight REST API. Run: uvicorn api:app --port 8000"""
from contextlib import asynccontextmanager
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
MODEL_DIR = "data/models/distilbert"
LABELS = {0: "Negative", 1: "Neutral", 2: "Positive"}
model = None
tokenizer = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global model, tokenizer
model = DistilBertForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_DIR)
model.eval()
yield
app = FastAPI(lifespan=lifespan)
class ReviewInput(BaseModel):
text: str
class SentimentResult(BaseModel):
label: str
confidence: float
probabilities: dict
@app.post("/classify", response_model=SentimentResult)
def classify(review: ReviewInput):
inputs = tokenizer(review.text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
return SentimentResult(
label=LABELS[pred],
confidence=round(probs[pred].item(), 4),
probabilities={LABELS[i]: round(probs[i].item(), 4) for i in range(3)},
)
Call it:
curl -X POST http://localhost:8000/classify \
-H "Content-Type: application/json" \
-d '{"text": "This book was a page-turner from start to finish."}'
{"label":"Positive","confidence":0.9756,"probabilities":{"Negative":0.0042,"Neutral":0.0202,"Positive":0.9756}}
3. HTML form + vanilla JavaScript (browser)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sentiment Classifier</title>
<style>
body { font-family: system-ui; max-width: 600px; margin: 3rem auto; padding: 0 1rem; }
textarea { width: 100%; height: 100px; margin-bottom: 0.5rem; }
pre { background: #f5f5f5; padding: 1rem; border-radius: 6px; white-space: pre-wrap; }
.Positive { border-left: 4px solid #0891B2; }
.Negative { border-left: 4px solid #DC2626; }
.Neutral { border-left: 4px solid #EA580C; }
</style>
</head>
<body>
<h2>What sentiment is this review?</h2>
<textarea id="review" placeholder="Paste a product review..."></textarea>
<button onclick="classify()">Classify</button>
<pre id="result"></pre>
<script>
async function classify() {
const text = document.getElementById("review").value;
const res = await fetch("http://localhost:8000/classify", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text }),
});
const data = await res.json();
document.getElementById("result").textContent = JSON.stringify(data, null, 2);
document.getElementById("result").className = data.label;
}
</script>
</body>
</html>
4. Google Colab interactive widget
# Run in a Colab cell — instant text box + classify button
import ipywidgets as widgets
from IPython.display import display, JSON
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
MODEL_DIR = "data/models/distilbert"
LABELS = {0: "Negative", 1: "Neutral", 2: "Positive"}
model = DistilBertForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_DIR)
model.eval()
text_input = widgets.Textarea(placeholder="Paste a review...", layout={"width": "100%", "height": "80px"})
button = widgets.Button(description="Classify", button_style="primary")
output = widgets.Output()
def on_click(_):
with output:
output.clear_output()
inputs = tokenizer(text_input.value, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
display(JSON({
"label": LABELS[pred],
"confidence": round(probs[pred].item(), 4),
"probabilities": {LABELS[i]: round(probs[i].item(), 4) for i in range(3)}
}))
button.on_click(on_click)
display(text_input, button, output)
5. Streamlit dashboard
"""Save as streamlit_app.py — run: streamlit run streamlit_app.py"""
import streamlit as st
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
MODEL_DIR = "data/models/distilbert"
LABELS = {0: "Negative", 1: "Neutral", 2: "Positive"}
COLORS = {"Negative": "#DC2626", "Neutral": "#EA580C", "Positive": "#0891B2"}
@st.cache_resource
def load_model():
model = DistilBertForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_DIR)
model.eval()
return model, tokenizer
model, tokenizer = load_model()
st.title("Review Sentiment Classifier")
review = st.text_area("Paste a product review:", height=100)
if st.button("Classify"):
inputs = tokenizer(review, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze()
pred = torch.argmax(probs).item()
st.markdown(f"## Sentiment: :{COLORS[LABELS[pred]]}[{LABELS[pred]}]")
st.metric("Confidence", f"{probs[pred]:.1%}")
col1, col2, col3 = st.columns(3)
with col1:
st.metric("Negative", f"{probs[0]:.1%}")
with col2:
st.metric("Neutral", f"{probs[1]:.1%}")
with col3:
st.metric("Positive", f"{probs[2]:.1%}")
for i, label in enumerate(["Negative", "Neutral", "Positive"]):
st.progress(float(probs[i]), text=f"{label}: {probs[i]:.1%}")
6. Batch CSV processor (process thousands of reviews at once)
"""batch_classify.py — reads a CSV with a 'review_text' column, writes results."""
import pandas as pd
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
from torch.utils.data import DataLoader, Dataset
MODEL_DIR = "data/models/distilbert"
LABELS = {0: "Negative", 1: "Neutral", 2: "Positive"}
BATCH_SIZE = 32
model = DistilBertForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_DIR)
model.eval()
class ReviewDataset(Dataset):
def __init__(self, texts):
self.texts = texts
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
return tokenizer(self.texts[idx], truncation=True, max_length=256, padding="max_length", return_tensors="pt")
df = pd.read_csv("input_reviews.csv")
dataset = ReviewDataset(df["review_text"].tolist())
loader = DataLoader(dataset, batch_size=BATCH_SIZE)
all_preds = []
with torch.no_grad():
for batch in loader:
batch = {k: v.squeeze(1) for k, v in batch.items()}
logits = model(**batch).logits
preds = torch.argmax(logits, dim=-1)
all_preds.extend(preds.tolist())
df["sentiment"] = [LABELS[p] for p in all_preds]
df.to_csv("classified_reviews.csv", index=False)
print(f"Done — {len(df)} reviews classified.")
Files in this folder
| File | Description |
|---|---|
model.safetensors |
DistilBERT fine-tuned weights (SafeTensors, 255 MB, 66M params) |
config.json |
Model architecture config: DistilBertForSequenceClassification, label mappings, dropout values |
tokenizer.json |
BERT WordPiece tokenizer vocabulary (30,522 tokens, uncased) |
tokenizer_config.json |
Tokenizer configuration: max_length, special tokens, truncation side |
metrics_distilbert.json |
Full training history: per-step loss, per-epoch eval metrics, hyperparameters |
predictions_distilbert.csv |
Per-review predictions on 32,092 test samples: text, true_label, predicted_label, confidence |
confusion_matrix.png |
Heatmap visualization of the confusion matrix |
README.md |
This file |
- Downloads last month
- 144
