-
Beyond Language Models: Byte Models are Digital World Simulators
Paper ⢠2402.19155 ⢠Published ⢠53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper ⢠2402.19427 ⢠Published ⢠56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper ⢠2403.00071 ⢠Published ⢠24
Collections
Discover the best community collections!
Collections including paper arxiv:2403.00522
-
Can Large Language Models Understand Context?
Paper ⢠2402.00858 ⢠Published ⢠23 -
OLMo: Accelerating the Science of Language Models
Paper ⢠2402.00838 ⢠Published ⢠85 -
Self-Rewarding Language Models
Paper ⢠2401.10020 ⢠Published ⢠151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper ⢠2401.17072 ⢠Published ⢠25
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ⢠2402.13232 ⢠Published ⢠16 -
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠100 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ⢠2402.13251 ⢠Published ⢠14
-
A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging
Paper ⢠2310.20381 ⢠Published ⢠2 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper ⢠2310.19061 ⢠Published ⢠8 -
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Paper ⢠2310.18652 ⢠Published ⢠1 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper ⢠2402.17764 ⢠Published ⢠627
-
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Paper ⢠2311.02077 ⢠Published ⢠15 -
System 2 Attention (is something you might need too)
Paper ⢠2311.11829 ⢠Published ⢠43 -
Large Language Models for Mathematicians
Paper ⢠2312.04556 ⢠Published ⢠12 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper ⢠2402.15627 ⢠Published ⢠36 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper ⢠2402.19155 ⢠Published ⢠53 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46 -
Stealing Part of a Production Language Model
Paper ⢠2403.06634 ⢠Published ⢠91
-
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠100 -
Genie: Generative Interactive Environments
Paper ⢠2402.15391 ⢠Published ⢠71 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper ⢠2402.17177 ⢠Published ⢠88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper ⢠2306.17107 ⢠Published ⢠11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper ⢠2305.07895 ⢠Published ⢠1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper ⢠2308.12966 ⢠Published ⢠11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper ⢠2401.15947 ⢠Published ⢠53
-
Exponentially Faster Language Modelling
Paper ⢠2311.10770 ⢠Published ⢠119 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video ⢠Updated ⢠165k ⢠3.21k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper ⢠2311.13384 ⢠Published ⢠53 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper ⢠2311.12454 ⢠Published ⢠30
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠108 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠160 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠56
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper ⢠2402.19155 ⢠Published ⢠53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper ⢠2402.19427 ⢠Published ⢠56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper ⢠2403.00071 ⢠Published ⢠24
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper ⢠2402.15627 ⢠Published ⢠36 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper ⢠2402.19155 ⢠Published ⢠53 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46 -
Stealing Part of a Production Language Model
Paper ⢠2403.06634 ⢠Published ⢠91
-
Can Large Language Models Understand Context?
Paper ⢠2402.00858 ⢠Published ⢠23 -
OLMo: Accelerating the Science of Language Models
Paper ⢠2402.00838 ⢠Published ⢠85 -
Self-Rewarding Language Models
Paper ⢠2401.10020 ⢠Published ⢠151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper ⢠2401.17072 ⢠Published ⢠25
-
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠100 -
Genie: Generative Interactive Environments
Paper ⢠2402.15391 ⢠Published ⢠71 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper ⢠2402.17177 ⢠Published ⢠88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ⢠2402.13232 ⢠Published ⢠16 -
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠100 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ⢠2402.13251 ⢠Published ⢠14
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper ⢠2306.17107 ⢠Published ⢠11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper ⢠2305.07895 ⢠Published ⢠1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper ⢠2308.12966 ⢠Published ⢠11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper ⢠2401.15947 ⢠Published ⢠53
-
A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging
Paper ⢠2310.20381 ⢠Published ⢠2 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper ⢠2310.19061 ⢠Published ⢠8 -
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Paper ⢠2310.18652 ⢠Published ⢠1 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper ⢠2402.17764 ⢠Published ⢠627
-
Exponentially Faster Language Modelling
Paper ⢠2311.10770 ⢠Published ⢠119 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video ⢠Updated ⢠165k ⢠3.21k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper ⢠2311.13384 ⢠Published ⢠53 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper ⢠2311.12454 ⢠Published ⢠30
-
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Paper ⢠2311.02077 ⢠Published ⢠15 -
System 2 Attention (is something you might need too)
Paper ⢠2311.11829 ⢠Published ⢠43 -
Large Language Models for Mathematicians
Paper ⢠2312.04556 ⢠Published ⢠12 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ⢠2403.00522 ⢠Published ⢠46
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠108 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠160 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠56