ELECTRA Paper Doubts

Srinjoy · September 8, 2023, 7:41am

Hello Everyone,
I am Srinjoy, a master’s student currently studying NLP. I was reading the ELECTRA paper by Clark et al. I learned about the implementation and had a few doubts.

I was wondering if you could help me with those.

What exactly does the “Step” mean in step count? Does it mean 1 epoch or 1 minibatch?
1. Also, in the paper I saw (specifically in Table 1), ELECTRA-SMALL and BERT-SMALL both have 14M parameters, how is that possible as ELECTRA should have more parameters because its generator and discriminator module are both BERT-based?
Also, what is the architecture of both the generator and discriminator? Are they both BERT to something else?
Also, we have a sampling step between the generator and the discriminator. How are you back-propagating the gradients through this?

Thanks in advance

Topic		Replies	Views
TinyReformer/TinyLongformer details Models	3	432	November 6, 2020
Using bert tokenizer in Electra model 🤗Transformers	0	352	September 27, 2021
Electra relative position embedding ("relative_key_query") Models	0	233	September 30, 2023
Can I train ELECTRA from scratch using hugging face? Models	0	209	January 31, 2024
Training of new ELECTRA or ConvBERT language model possible? 🤗Transformers	0	261	May 3, 2021

ELECTRA Paper Doubts

Related topics