RobustMedCLIP / README.md
razaimam45's picture
Update README.md
e1393d5 verified
# RobustMedCLIP: On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?
> **Accepted at [Medical Image Understanding and Analysis (MIUA) 2025]**
[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Paper](https://img.shields.io/badge/Paper-PDF-blue)](https://arxiv.org/abs/2505.15425)
[![Dataset](https://img.shields.io/badge/Dataset-MediMeta--C-orange)](https://huggingface.co/datasets/razaimam45/MediMeta-C)
[![Model](https://img.shields.io/badge/Model-MediMeta--C-yellow)](https://huggingface.co/razaimam45/RobustMedCLIP)
[![Project](https://img.shields.io/badge/Project-RobustMedCLIP-red)](https://github.com/BioMedIA-MBZUAI/RobustMedCLIP)
---
## πŸš€ Highlights
- 🧠 **MVLM Benchmarking**: Evaluate 5 major and recent MVLMs across **5 modalities**, **7 corruption types**, and **5 severity levels**
- πŸ“‰ **Corruption Evaluation**: Analyze degradation under Gaussian noise, motion blur, pixelation, etc.
- πŸ”¬ **MediMeta-C**: A new benchmark simulating real-world OOD shifts in high-res medical images
- πŸ§ͺ **Few-shot Robustness**: **RobustMedCLIP** uses just 1-10% of clean data for adaptation
- 🧠 **LoRA Efficient Tuning**: Low-rank fine-tuning in transformer attention layers
<p align="center">
<img src="assets/pipeline.png" width="750" alt="Pipeline Overview">
</p>
<p align="center">
Overview of the RobustMedCLIP pipeline: A) Few-shot Sampling of Clean Samples from MediMeta and MedMNIST across 5 modalities; B) Fine-tuning LoRA adapters using Few-shot samples; C) Distribution Shifts of MediMeta-C compared to Clean samples; D) Evaluation Results across Top-1 Accuracy and Corruption Error for 4 baselines and RobustMedCLIP.
</p>
---
## πŸ“¦ Installation
```bash
git clone https://github.com/BioMedIA-MBZUAI/RobustMedCLIP.git
cd RobustMedCLIP
conda create -n robustmedclip python=3.12.7
conda activate robustmedclip
pip install -r requirements.txt
pip install hugginface_hub
````
You will also need `<YOUR-HUGGINGFACE-TOKEN>` with your personal Hugging Face access token, to directly download Datasets and Model Weights.\
To create an access token, go to your Huggingface `Settings`, then click on the `Access Tokens` tab. Click on the New token button to create a new User Access Token.
---
## 🧠 Models
All baseline and RobustMedCLIP model checkpoints are available for direct download via Hugging Face at [RobustMedCLIP](https://huggingface.co/razaimam45/RobustMedCLIP/tree/main):
```bash
huggingface-cli download razaimam45/RobustMedCLIP \
--local-dir ./outputs \
--repo-type model \
--token <YOUR-HUGGINGFACE-TOKEN>
```
πŸ“ `Outputs` Folder Structure: The `outputs/` folder (should be in root folder) contains all trained model weights and evaluation results:
```bash
outputs/
β”œβ”€β”€ checkpoints/ # Baseline MVLMs (MedCLIP, UniMedCLIP)
β”œβ”€β”€ exp-rank-8/ # RobustMedCLIP (LoRA Rank = 8) for ViT and ResNet across few-shots (1/3/7/10)%
β”œβ”€β”€ exp-rank-16/ # RobustMedCLIP (LoRA Rank = 16) for ViT and ResNet across few-shots (1/3/7/10)%
└── results/ # Evaluation logs across mCE/Accuracy metrics
```
---
## 🧬 Datasets
This project proposes MediMeta-C as corruption benchmark; and evaluates MVLMs on MedMNIST-C and MediMeta-C benchmarks.
| Dataset | Modality | Clean Samples | Corruption Sets | Resolution |
|----------------|------------------|----------------|------------------|-------------|
| **MediMeta-C** | Multi-modality | 5 Modalities | 7 corruptions Γ— 5 levels | High-res |
| **MedMNIST-C** | Public Benchmark | 5 Modalities | 7 corruptions Γ— 5 levels | Low-res |
### πŸ“‚ Dataset Structure
The MediMeta-C dataset is hosted on HuggingFace and organized as follows:
```bash
MediMeta-C/
β”œβ”€β”€ pbc/ # Blood Cell modality
β”‚ β”œβ”€β”€ test/ # Test set
β”‚ β”‚ β”œβ”€β”€ clean.npz # Clean samples
β”‚ β”‚ β”œβ”€β”€ brightness_severity_1.npz
β”‚ β”‚ β”œβ”€β”€ brightness_severity_2.npz
β”‚ β”‚ β”œβ”€β”€ ... # Other severity levels
β”‚ β”‚ └── brightness_severity_5.npz
β”‚ β”œβ”€β”€ val/ # Validation set
β”‚ β”œβ”€β”€ clean.npz
β”‚ β”œβ”€β”€ contrast_severity_1.npz
β”‚ β”œβ”€β”€ contrast_severity_2.npz
β”‚ β”œβ”€β”€ ... # Other severity levels
β”‚ └── contrast_severity_5.npz
β”œβ”€β”€ fundus/ # Fundus modality
β”‚ β”œβ”€β”€ test/
β”‚ β”œβ”€β”€ val/
β”‚ └── ... # Similar structure as above
β”œβ”€β”€ ... # Other modalities
└── README.md # Dataset description
```
You can download the dataset from: [MediMeta-C](https://huggingface.co/datasets/razaimam45/MediMeta-C/tree/main), and [MedMNIST-C](https://github.com/francescodisalvo05/medmnistc-api). The downloaded folder `data/MediMeta-C` should be in the root of the project folder.
```bash
huggingface-cli download razaimam45/MediMeta-C --local-dir ./data/MediMeta-C --repo-type dataset --token <YOUR-HUGGINGFACE-TOKEN>
````
---
## πŸ”§ Usage
### 1. Few-Shot Tuning
You can fine-tune RobustMedCLIP with either ViT or ResNet backbones:
```bash
# Fine-tune with ViT backbone (e.g., BioMedCLIP)
bash scripts/run_finetune_vit.sh
# Fine-tune with ResNet backbone (e.g., MedCLIP)
bash scripts/run_finetune_resnet.sh
```
### 2. Evaluation
Evaluate a fine-tuned or pretrained MVLM (including RMedCLIP):
```bash
# Evaluation for RobustMedCLIP (RMC)
bash scripts/run_eval_rmed.sh
# Custom evaluation on other models (rmedclip, biomedclip, unimedclip, medclip, clip)
python evaluate.py --model rmedclip \
--backbone vit \
--gpu 0 --corruptions all --collection medimeta
```
---
## πŸ“Š Results
RobustMedCLIP consistently outperforms prior MVLMs under corruptions across all modalities:
| Model | Clean Error ↓ | mCE ↓ (avg) |
| ------------ | ------------- | ----------- |
| CLIP | 100.0 | 100.0 |
| MedCLIP | 106.4 | 112.5 |
| BioMedCLIP | 116.3 | 126.8 |
| UniMedCLIP | 111.8 | 98.87 |
| **RMedCLIP** | **62.8** | **81.0** |
Detailed benchmarks available in `Results and Discussions`.
---
## ✏️ Citation
If you find this repository helpful, please cite our paper:
```bibtex
@misc{imam2025robustnessmedicalvisionlanguagemodels,
title={On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?},
author={Raza Imam and Rufael Marew and Mohammad Yaqub},
year={2025},
eprint={2505.15425},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.15425},
}
```
---
## 🀝 Acknowledgements
* Built on top of [BioMedCLIP](https://arxiv.org/abs/2303.00915) and [MedCLIP](https://arxiv.org/abs/2210.10163)
* MediMeta-C corruption designs are inspired by [ImageNet-C](https://arxiv.org/abs/1903.12261) and [MedMNIST-C](https://arxiv.org/abs/2406.17536)
For questions, contact: **[[email protected]](mailto:[email protected])**