TinyReformer/TinyLongformer details

petulla · October 26, 2020, 5:33pm

Hi

@patrickvonplaten I was just wondering if you could share any benchmarking or information on the tiny reformer/longformer models you trained. Which models are they distillations of? Have you benchmarked their performance at all?

I am looking to do something similar but was hoping to get the details of these models before progressing.

petulla · October 28, 2020, 4:17am

I’m also wondering if you have any insight into why bert-base is so often used as the teacher model for the DistillBERT/TinyBERT models. I saw one paper on Robeta that really suggested teaching from a large model would make more sense, I believe.

valhalla · October 31, 2020, 2:55pm

AFAIK, the tiny reformer and longformer models are not distilled but randomly created smaller models for testing purpose, not meant to be used for training

petulla · November 6, 2020, 11:51pm

hm the Tiny signification usually implies distillation. How did you learn this?

Topic		Replies	Views
Transformers for small datasets? Beginners	3	73	October 9, 2024
Distillation for LongT5 Beginners	0	193	January 6, 2024
Tutorial: Implementing Transformer from Scratch - A Step-by-Step Guide Show and Tell	5	4333	May 1, 2025
Small miniLM model for multilingual 🤗Transformers	0	326	October 7, 2021
ELECTRA Paper Doubts Research	0	217	September 8, 2023

TinyReformer/TinyLongformer details

Related topics