Hugging Face Reads - 01/2021 - Sparsity and Pruning
|
|
7
|
1517
|
January 24, 2021
|
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
|
|
1
|
71
|
January 20, 2021
|
Significance of the [CLS] token
|
|
4
|
61
|
January 18, 2021
|
The (hidden) meaning behind the embedding of the padding token?
|
|
0
|
23
|
January 15, 2021
|
Multilingual token, phrase and sentence representations for text similarity
|
|
0
|
38
|
January 13, 2021
|
RoBERTa trained on NSP
|
|
0
|
16
|
January 12, 2021
|
Classification problem difficulty when going from 3 classes to 5 classes?
|
|
1
|
37
|
January 11, 2021
|
Text to Text Transformer - T5
|
|
2
|
92
|
January 4, 2021
|
Shortformer: Better Language Modeling using Shorter Inputs
|
|
0
|
47
|
December 31, 2020
|
Don't Stop Pretraining BART
|
|
1
|
49
|
December 29, 2020
|
Pre-training with Lamb optimizer
|
|
7
|
233
|
December 28, 2020
|
About the encoder and generator used in the RAG model
|
|
2
|
55
|
December 25, 2020
|
MRPC Reproducibility with transformers-4.1.0
|
|
0
|
28
|
December 19, 2020
|
Seq2Seq Distillation: Methodology Questions
|
|
4
|
331
|
December 17, 2020
|
Using transformers (BERT, RoBERTa) without embedding layer
|
|
8
|
106
|
December 16, 2020
|
What are some recommended pretrained models for extracting semantic feature on single sentence?
|
|
4
|
76
|
December 14, 2020
|
BORT: Optimal Subarchitecture Extraction for BERT
|
|
1
|
86
|
December 5, 2020
|
Training generative models based on "rewards"
|
|
0
|
28
|
December 4, 2020
|
EMNLP Picks from the Hugging Face Science Team
|
|
1
|
2279
|
December 2, 2020
|
Language model to search an answer in a huge collection of (unrelated) paragraphs
|
|
1
|
80
|
November 27, 2020
|
Meta Persona an abstract adaptive neural construct
|
|
0
|
38
|
November 25, 2020
|
Adding learnable coefficients for multi-objective losses?
|
|
2
|
58
|
November 25, 2020
|
Inference on constrained devices
|
|
0
|
32
|
November 21, 2020
|
Is there an easy way to apply layer-wise decaying learning rate in huggingface trainer for RobertaMaskedForLM?
|
|
2
|
72
|
November 16, 2020
|
Pre-Train BERT (from scratch)
|
|
39
|
1283
|
November 16, 2020
|
What are some popular datasets for domain adaptation in NLP
|
|
1
|
74
|
November 12, 2020
|
Carrying Gradients Through Generate
|
|
4
|
189
|
November 2, 2020
|
Adding features to a pretrained language model
|
|
3
|
417
|
October 28, 2020
|
Bart-base rouge scores
|
|
11
|
285
|
October 27, 2020
|
Guide: The best way to calculate the perplexity of fixed-length models
|
|
3
|
380
|
October 21, 2020
|