Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1910.10683

Papers - Transfer Learning

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 3
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 104
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1

google-t5/t5-base

Translation • 0.2B • Updated Feb 14, 2024 • 2.25M • • 757
google-t5/t5-small

Translation • 60.5M • Updated Jun 30, 2023 • 2.32M • • 511
google-t5/t5-large

Translation • 0.7B • Updated Apr 6, 2023 • 208k • • 231
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Datasets - Text

bigcode/the-stack

Viewer • Updated Apr 13, 2023 • 546M • 24.2k • 890
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15
allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 685k • 485
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 241k • 233

Papers - NLP Research

Efficient Estimation of Word Representations in Vector Space

Paper • 1301.3781 • Published Jan 16, 2013 • 8
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 56
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 11

machine learning and neural network papers 📜

SMOTE: Synthetic Minority Over-sampling Technique

Paper • 1106.1813 • Published Jun 9, 2011 • 1
Scikit-learn: Machine Learning in Python

Paper • 1201.0490 • Published Jan 2, 2012 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Paper • 1406.1078 • Published Jun 3, 2014 • 1
Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014

Papers - Transfer Learning

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Datasets - Text

bigcode/the-stack

Viewer • Updated Apr 13, 2023 • 546M • 24.2k • 890
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15
allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 685k • 485
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 241k • 233

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 3
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9

Papers - NLP Research

Efficient Estimation of Word Representations in Vector Space

Paper • 1301.3781 • Published Jan 16, 2013 • 8
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 104
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 56
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 11

The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1

google-t5/t5-base

Translation • 0.2B • Updated Feb 14, 2024 • 2.25M • • 757
google-t5/t5-small

Translation • 60.5M • Updated Jun 30, 2023 • 2.32M • • 511
google-t5/t5-large

Translation • 0.7B • Updated Apr 6, 2023 • 208k • • 231
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

machine learning and neural network papers 📜

SMOTE: Synthetic Minority Over-sampling Technique

Paper • 1106.1813 • Published Jun 9, 2011 • 1
Scikit-learn: Machine Learning in Python

Paper • 1201.0490 • Published Jan 2, 2012 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Paper • 1406.1078 • Published Jun 3, 2014 • 1
Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs