Papers - Interpretability
updated
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper
• 2208.01626
• Published • 3
BERT Rediscovers the Classical NLP Pipeline
Paper
• 1905.05950
• Published • 3
A Multiscale Visualization of Attention in the Transformer Model
Paper
• 1906.05714
• Published • 2
Analyzing Transformers in Embedding Space
Paper
• 2209.02535
• Published • 3
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language
Models
Paper
• 2404.03118
• Published • 25
The Geometry of Categorical and Hierarchical Concepts in Large Language
Models
Paper
• 2406.01506
• Published • 3
Confidence Regulation Neurons in Language Models
Paper
• 2406.16254
• Published • 10
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting
Rare Concepts in Foundation Models
Paper
• 2411.00743
• Published • 7
Do I Know This Entity? Knowledge Awareness and Hallucinations in
Language Models
Paper
• 2411.14257
• Published • 14
Paper
• 2412.09764
• Published • 5