XiaoFu666
/

SPECS

Zero-Shot Image Classification

English

Model card Files Files and versions

xet

Community

XiaoFu666 commited on Sep 2

Commit

5e9812a

verified ·

1 Parent(s): 35f8550

Update README.md

Browse files

Files changed (1) hide show

README.md +65 -2

README.md CHANGED Viewed

@@ -5,5 +5,68 @@ datasets:
 language:
 - en
 base_model:
-- openai/clip-vit-base-patch32
----

 language:
 - en
 base_model:
+- BeichenZhang/LongCLIP-B
+---
+You can compute SPECS scores for an image–caption pair using the following code:
+```python
+from PIL import Image
+import torch
+import torch.nn.functional as F
+from model import longclip
+# Device configuration
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Using device: {device}")
+# Load SPECS model
+model, preprocess = longclip.load("spec.pt", device=device)
+model.eval()
+# Load image
+image_path = "SPECS/images/cat.png"
+image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
+# Define text descriptions
+texts = [
+    "A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. "
+    "The cat is partially tucked under a multi-colored woven jumper.",
+    "A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. "
+    "The cat is partially tucked under a multi-colored woven blanket.",
+    "A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. "
+    "The cat is partially tucked under a multi-colored woven blanket with fringed edges."
+]
+# Process inputs
+text_tokens = longclip.tokenize(texts).to(device)
+# Get features and calculate SPECS
+with torch.no_grad():
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text_tokens)
+    # Calculate cosine similarity
+    similarity = F.cosine_similarity(image_features.unsqueeze(1), text_features.unsqueeze(0), dim=-1)
+    # SPECS
+    specs_scores = torch.clamp((similarity + 1.0) / 2.0, min=0.0)
+# Output results
+print("SPECS")
+for i, score in enumerate(specs_scores.squeeze()):
+    print(f" Text {i+1}: {score:.4f}")
+```
+This shows that SPECS successfully assigns progressively higher scores to captions with more fine-grained and correct details:
+- **Text 1**: *"A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. The cat is partially tucked under a multi-colored woven jumper."*
+  → **Score: 0.4293**
+- **Text 2**: *"A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. The cat is partially tucked under a multi-colored woven blanket."*
+  → **Score: 0.4457**
+- **Text 3**: *"A British Shorthair cat with plush, bluish-gray fur is lounging on a deep green velvet sofa. The cat is partially tucked under a multi-colored woven blanket with fringed edges."*
+  → **Score: 0.4583**