Collections
Discover the best community collections!
Collections including paper arxiv:2403.10517
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 21 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 6 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 15 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 81 -
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 31 -
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 40 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 88
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 15 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 21 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 6 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 81 -
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 31 -
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 40 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 88
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118