PreTrain ELECTRA from scratch in Portuguese

ELECTRA for Portuguese

Currently, there is no ELECTRA or ELECTRA Large model that was trained from scratch for Portuguese on the hub: Hugging Face – The AI community building the future.. For this project, the goal is to create a ELECTRA model for just the Portuguese language.

Model

A randomly initialized ELECTRA model

Available training scripts

A masked language modeling script for Flax is available here. It can be used pretty much without any required code changes.

(Optional) Desired project outcome

The desired project output is a strong ELECTRA model in Portuguese.

1 Like

ELECTRA actually uses a special pretraining script, which would need to be written during the sprint - if you are very motivated I’m willing to confirm this “single-person” project and give you a TPU VM. Please leave a comment on the official googe sheet in this case: Confirmed teams for Flax/JAX community week - Google Sheets

Hi Patrick, I am also interested in training Electra for Portuguese. I don’t know if this is currently under development or already has been done, if not, I would like to implement this pretraining script and train the model for Portuguese.