Collections
Discover the best community collections!
Collections including paper arxiv:1910.10683
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
google-t5/t5-base
Translation • 0.2B • Updated • 2.25M • • 757 -
google-t5/t5-small
Translation • 60.5M • Updated • 2.32M • • 511 -
google-t5/t5-large
Translation • 0.7B • Updated • 208k • • 231 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15
-
bigcode/the-stack
Viewer • Updated • 546M • 24.2k • 890 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15 -
allenai/c4
Viewer • Updated • 10.4B • 685k • 485 -
allenai/ai2_arc
Viewer • Updated • 7.79k • 241k • 233
-
Llemma: An Open Language Model For Mathematics
Paper • 2310.10631 • Published • 56 -
Mistral 7B
Paper • 2310.06825 • Published • 55 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper • 2309.11568 • Published • 11
-
SMOTE: Synthetic Minority Over-sampling Technique
Paper • 1106.1813 • Published • 1 -
Scikit-learn: Machine Learning in Python
Paper • 1201.0490 • Published • 1 -
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper • 1406.1078 • Published • 1 -
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published
-
bigcode/the-stack
Viewer • Updated • 546M • 24.2k • 890 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15 -
allenai/c4
Viewer • Updated • 10.4B • 685k • 485 -
allenai/ai2_arc
Viewer • Updated • 7.79k • 241k • 233
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
Llemma: An Open Language Model For Mathematics
Paper • 2310.10631 • Published • 56 -
Mistral 7B
Paper • 2310.06825 • Published • 55 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper • 2309.11568 • Published • 11
-
google-t5/t5-base
Translation • 0.2B • Updated • 2.25M • • 757 -
google-t5/t5-small
Translation • 60.5M • Updated • 2.32M • • 511 -
google-t5/t5-large
Translation • 0.7B • Updated • 208k • • 231 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15
-
SMOTE: Synthetic Minority Over-sampling Technique
Paper • 1106.1813 • Published • 1 -
Scikit-learn: Machine Learning in Python
Paper • 1201.0490 • Published • 1 -
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper • 1406.1078 • Published • 1 -
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published