Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.00522

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 48
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 100
FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Paper • 2402.13251 • Published Feb 20, 2024 • 14

Multimodal VQA for medicine

A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging

Paper • 2310.20381 • Published Oct 31, 2023 • 2
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

Paper • 2310.19061 • Published Oct 29, 2023 • 8
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Paper • 2310.18652 • Published Oct 28, 2023 • 1
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627

Interesting SSL papers

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Paper • 2311.02077 • Published Nov 3, 2023 • 15
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 43
Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 12
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 36
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 100
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 71
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11
On the Hidden Mystery of OCR in Large Multimodal Models

Paper • 2305.07895 • Published May 13, 2023 • 1
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53

Language Models

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 119
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 165k • 3.21k
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 30

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 108
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 56

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 36
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 100
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 71
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 48
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 100
FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Paper • 2402.13251 • Published Feb 20, 2024 • 14

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11
On the Hidden Mystery of OCR in Large Multimodal Models

Paper • 2305.07895 • Published May 13, 2023 • 1
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53

Multimodal VQA for medicine

A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging

Paper • 2310.20381 • Published Oct 31, 2023 • 2
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

Paper • 2310.19061 • Published Oct 29, 2023 • 8
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Paper • 2310.18652 • Published Oct 28, 2023 • 1
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627

Language Models

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 119
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 165k • 3.21k
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 30

Interesting SSL papers

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Paper • 2311.02077 • Published Nov 3, 2023 • 15
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 43
Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 12
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 108
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 56

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs