Model Name: EEG-VJEPA
Description:
EEG-VJEPA is a self-supervised learning model for EEG signal analysis. It treats EEG signals as spatiotemporal sequences, leveraging both spatial and temporal information. This model uses a Vision Transformer (ViT) as the backbone to capture long-range dependencies, enabling robust representation learning from EEG data. (paper coming soon, arXiv version here: https://arxiv.org/abs/2507.03633)
Performance
(ViT-M-4x30x4)
Accuracy (Frozen Eval): 83.0%
F1 Score (Frozen Eval): 82.4%
AUROC (Frozen Eval): 87.7%
(ViT-B-4x30x2)
Accuracy (Frozen Eval): 81.2%
F1 Score (Frozen Eval): 81.0%
AUROC (Frozen Eval): 87.9%
Usage
This model can be used / fine-tuned for various EEG analysis tasks, including fine-tuning for abnormal EEG classification. It is particularly effective for scenarios where spatial and temporal dependencies are critical. The code repository can be found at https://github.com/amir-hojjati/eeg-vjepa. The base model's encoder can be loaded with the pre-trained weights provided here and with the hyperparameters specificed in the paper for each model.
Training Configuration
Optimizer: AdamW
Learning Rate: 0.0002 (initial), 0.000625 (reference), 1e-6 (final)
Weight Decay: CosineWDSchedule (0.04, 0.4)
Gradient Clipping: Max Norm 10.0
Pretrained on: 5 Nvidia V100-32GB GPUs for up to 400 epochs
Pretraining Details
Pre-training Dataset: NMT + TUH
Batch Size: 10
Encoder Size
(ViT-M-4x30x4): Medium (ViT-M) with 21M parameters
(ViT-B-4x30x2): Base (ViT-B) with 85M parameters
Predictor Depth
(ViT-M-4x30x4): 6
(ViT-B-4x30x2): 12
Sampling Rate
(ViT-M-4x30x4): 3
(ViT-B-4x30x2): 2
# Frames
(ViT-M-4x30x4): 32
(ViT-B-4x30x2): 16
# Clips
(ViT-M-4x30x4): 1
(ViT-B-4x30x2): 3
Tubelet Size
(ViT-M-4x30x4): 4
(ViT-B-4x30x2): 2
Augmentation
(ViT-M-4x30x4): Spatial (AR + Scale)
(ViT-B-4x30x2): Spatial (AR + Scale)
Random Resize Aspect Ratio
(ViT-M-4x30x4): (0.75, 1.35)
(ViT-B-4x30x2): (0.75, 1.35)
Random Resize Scale:
(ViT-M-4x30x4): (0.3, 1.0)
(ViT-B-4x30x2): (0.3, 1.0)
Citation
@misc{hojjati2025videoeegadaptingjoint, title={From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis}, author={Amirabbas Hojjati and Lu Li and Ibrahim Hameed and Anis Yazidi and Pedro G. Lind and Rabindra Khadka}, year={2025}, eprint={2507.03633}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.03633}, }