Research

Topic	Replies	Views	Activity
ELECTRA training reimplementation and discussion Research	14	6663	September 17, 2023
Hugging Face Reads - 01/2021 - Sparsity and Pruning Research	14	7479	June 3, 2025
ASR spell correction Research	29	8694	April 24, 2024
Guide: The best way to calculate the perplexity of fixed-length models Research	9	9407	December 16, 2021
Significance of the [CLS] token Research	16	28058	September 5, 2024
Pre-Train BERT (from scratch) Research	43	18945	June 27, 2022
Forward-Forward algorithm by Geoffrey Hinton Research	10	4900	June 17, 2023
GPT2 for QA Pair Generation Research	9	8600	March 23, 2022
Copying mechanism for transformer Research	9	6454	February 23, 2024
EMNLP Picks from the Hugging Face Science Team Research	1	4063	December 2, 2020
Bart-base rouge scores Research	11	1727	October 27, 2020
FDA Label Document Embedding Research	9	1470	February 19, 2021
Science Tuesday: MARGE Awesome paper	7	3741	February 8, 2021
Fail to claim paper authorship Awesome paper	10	467	May 8, 2025
Does quantization compress the model weights? Research	16	358	September 26, 2024
The Lost Painting of a Century — AI Cross-Verification Reveals a Hidden Match with Van Gogh Research	9	106	May 10, 2025
ICLR 2020 highlights - Yacine Awesome paper	1	1746	July 11, 2020
Collaborative Training Experiment of an Albert Model for Bengali Research	1	1306	May 6, 2021
Multi-GPU Machine Setup Guide and QnA Research	6	6877	May 1, 2021
Why are huge batch sizes used for pretraining and small ones for finetuning? Research	3	10187	January 10, 2023
Online/streaming speech recognition Research	2	3034	October 26, 2022
ACL 2020 highlights - Yacine Research	0	1403	July 10, 2020
ACL 2020 - Some personal highlights - Victor Research	4	1365	July 14, 2020
From Crypto Mining to LLM Fine-tuning: Unlocking Large Language Model Fine-tuning through Collaborative Compute Pools Research	3	1967	January 25, 2025
Paper Notes: Deepspeed Mixture of Experts Research	2	2200	January 20, 2022
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6254	July 14, 2021
Using Google's Gemini for scientific literature Research	0	1405	December 14, 2023
Rust applications Research	6	4930	November 21, 2023
ACL 2020 highlights – Canwen Research	1	914	July 10, 2020
ACL 2020 highlights – Joe Research	3	1593	July 30, 2020
Grouphug: multi-task, multi-dataset training with 🤗 transformers/datasets Research	0	2504	June 15, 2022
Adding features to a pretrained language model Research	3	3875	October 28, 2020
Paper Discussion: Weight Poisoning Attacks on Pre-trained Models Awesome paper	0	1029	July 8, 2020
The ChatDEAF Project Has Officially Launched! Awesome paper	4	106	April 24, 2025
Question about loss calculation on LLM finetuning Research	0	7043	July 14, 2023
What does it mean to prime a GPT model? Research	5	4167	July 27, 2020
Adding domain knowledge in LLMs via fine tuning Research	2	5535	July 23, 2023
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Research	1	1603	January 20, 2021
Not all BLEU scores were created equal Research	0	313	September 15, 2020
Fine-tuned MLM based RoBERTa not improving performance Research	2	946	April 20, 2023
Citing/Crediting Language Models Research	6	15716	January 30, 2025
Extracting information from bills, tax statements, etc: What ML model to use? Research	3	3169	August 28, 2024
Confidence Scores / Self-Training for Wav2Vec2 / CTC models Research	1	3694	April 21, 2022
Discovery of Unsafe Models on Hugging Face Platform Research	0	1506	August 17, 2023
ChatDEAF Project – First Open ISL/TİD Dataset for Sign Language Accessibility Awesome paper	1	102	April 20, 2025
What can transformers learn without position encoding? Research	1	3115	June 10, 2021
Understanding FLOPs-per-token estimates from OpenAI's scaling laws Research	6	16029	September 20, 2023
How to use T5 for sentence embedding? Research	6	15979	May 27, 2023
Looking for a Translation Model for English to 100+ Languages, Comparable to DeepL or Google, for Local Deployment Research	4	18794	September 6, 2024
Making a model "think" before doing a tool call (ReAct paper) Research	2	241	April 4, 2025