ELECTRA training reimplementation and discussion
|
|
14
|
6663
|
September 17, 2023
|
Hugging Face Reads - 01/2021 - Sparsity and Pruning
|
|
14
|
7479
|
June 3, 2025
|
ASR spell correction
|
|
29
|
8694
|
April 24, 2024
|
Guide: The best way to calculate the perplexity of fixed-length models
|
|
9
|
9407
|
December 16, 2021
|
Significance of the [CLS] token
|
|
16
|
28058
|
September 5, 2024
|
Pre-Train BERT (from scratch)
|
|
43
|
18945
|
June 27, 2022
|
Forward-Forward algorithm by Geoffrey Hinton
|
|
10
|
4900
|
June 17, 2023
|
GPT2 for QA Pair Generation
|
|
9
|
8600
|
March 23, 2022
|
Copying mechanism for transformer
|
|
9
|
6454
|
February 23, 2024
|
EMNLP Picks from the Hugging Face Science Team
|
|
1
|
4063
|
December 2, 2020
|
Bart-base rouge scores
|
|
11
|
1727
|
October 27, 2020
|
FDA Label Document Embedding
|
|
9
|
1470
|
February 19, 2021
|
Science Tuesday: MARGE
|
|
7
|
3741
|
February 8, 2021
|
Fail to claim paper authorship
|
|
10
|
467
|
May 8, 2025
|
Does quantization compress the model weights?
|
|
16
|
358
|
September 26, 2024
|
The Lost Painting of a Century — AI Cross-Verification Reveals a Hidden Match with Van Gogh
|
|
9
|
106
|
May 10, 2025
|
ICLR 2020 highlights - Yacine
|
|
1
|
1746
|
July 11, 2020
|
Collaborative Training Experiment of an Albert Model for Bengali
|
|
1
|
1306
|
May 6, 2021
|
Multi-GPU Machine Setup Guide and QnA
|
|
6
|
6877
|
May 1, 2021
|
Why are huge batch sizes used for pretraining and small ones for finetuning?
|
|
3
|
10187
|
January 10, 2023
|
Online/streaming speech recognition
|
|
2
|
3034
|
October 26, 2022
|
ACL 2020 highlights - Yacine
|
|
0
|
1403
|
July 10, 2020
|
ACL 2020 - Some personal highlights - Victor
|
|
4
|
1365
|
July 14, 2020
|
From Crypto Mining to LLM Fine-tuning: Unlocking Large Language Model Fine-tuning through Collaborative Compute Pools
|
|
3
|
1967
|
January 25, 2025
|
Paper Notes: Deepspeed Mixture of Experts
|
|
2
|
2200
|
January 20, 2022
|
The (hidden) meaning behind the embedding of the padding token?
|
|
2
|
6254
|
July 14, 2021
|
Using Google's Gemini for scientific literature
|
|
0
|
1405
|
December 14, 2023
|
Rust applications
|
|
6
|
4930
|
November 21, 2023
|
ACL 2020 highlights – Canwen
|
|
1
|
914
|
July 10, 2020
|
ACL 2020 highlights – Joe
|
|
3
|
1593
|
July 30, 2020
|
Grouphug: multi-task, multi-dataset training with 🤗 transformers/datasets
|
|
0
|
2504
|
June 15, 2022
|
Adding features to a pretrained language model
|
|
3
|
3875
|
October 28, 2020
|
Paper Discussion: Weight Poisoning Attacks on Pre-trained Models
|
|
0
|
1029
|
July 8, 2020
|
The ChatDEAF Project Has Officially Launched!
|
|
4
|
106
|
April 24, 2025
|
Question about loss calculation on LLM finetuning
|
|
0
|
7043
|
July 14, 2023
|
What does it mean to prime a GPT model?
|
|
5
|
4167
|
July 27, 2020
|
Adding domain knowledge in LLMs via fine tuning
|
|
2
|
5535
|
July 23, 2023
|
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
|
|
1
|
1603
|
January 20, 2021
|
Not all BLEU scores were created equal
|
|
0
|
313
|
September 15, 2020
|
Fine-tuned MLM based RoBERTa not improving performance
|
|
2
|
946
|
April 20, 2023
|
Citing/Crediting Language Models
|
|
6
|
15716
|
January 30, 2025
|
Extracting information from bills, tax statements, etc: What ML model to use?
|
|
3
|
3169
|
August 28, 2024
|
Confidence Scores / Self-Training for Wav2Vec2 / CTC models
|
|
1
|
3694
|
April 21, 2022
|
Discovery of Unsafe Models on Hugging Face Platform
|
|
0
|
1506
|
August 17, 2023
|
ChatDEAF Project – First Open ISL/TİD Dataset for Sign Language Accessibility
|
|
1
|
102
|
April 20, 2025
|
What can transformers learn without position encoding?
|
|
1
|
3115
|
June 10, 2021
|
Understanding FLOPs-per-token estimates from OpenAI's scaling laws
|
|
6
|
16029
|
September 20, 2023
|
How to use T5 for sentence embedding?
|
|
6
|
15979
|
May 27, 2023
|
Looking for a Translation Model for English to 100+ Languages, Comparable to DeepL or Google, for Local Deployment
|
|
4
|
18794
|
September 6, 2024
|
Making a model "think" before doing a tool call (ReAct paper)
|
|
2
|
241
|
April 4, 2025
|