How could different model be trained same way?

sangil · July 26, 2021, 2:40pm

hello.
I am doing fine-tuning and pre-training bert and electra. But I really don’t understand how could those two models be trained same way.
Let’s say bert.
bert is trained using Masked Language Model(MLM) and next sentence prediction so logically if I fine-tuning bert using MLM, bert embedding would be more appropriate than before.
Problem is electra.
transformers documentation suggest same fine-tuning step like bert.
electra does not use MLM instead using discriminators. electra predict which token is fake and which token is true. similar with gan.
So logically If i try to fine-tuning electra, I guess i have to train electra which token is fake and which is true.
But transformers documentation suggest MLM. Even normal fine tuning also same.
model( input_ids, attention_mask, token_type_ids ,…etc)
this are the parameter which electra model get.
but what i think is this kinds of parameter should work for bert not electra.
anyone pls help me.

Topic		Replies	Views
Using bert tokenizer in Electra model 🤗Transformers	0	352	September 27, 2021
Continual pre-training vs. Fine-tuning a language model with MLM 🤗Transformers	5	8704	November 30, 2021
How big are differences between transformer implementations Intermediate	0	533	April 26, 2022
ELECTRA: Accounting for mask tokens that are correctly predicted by MLM 🤗Transformers	9	1283	May 15, 2021
Using Electra model Beginners	2	420	December 3, 2020

How could different model be trained same way?

Related topics