BERTIN: PreTrain RoBERTa-large from scratch in Spanish

Yes! Count me in!

1 Like

Really interesting! I’m in!

1 Like

Hey! Awesome project! I’m in :stuck_out_tongue:

1 Like

I think you have already a full working team, but just in case you need more people, I’d be willing to help. I don’t know if I could be of much use, but the opportunity to contribute to a model called BERTIN is too good to let it go :grin:

1 Like

Awesome to see so much interest here! We can also make multiple teams for the same project - so feel free to keep posting if you’re interested :slight_smile:

3 Likes

For this project we can have at least 2 teams!

1 Like

This project sounds good! I’d like to help with either this one or the Spanish GPT-2, wherever makes more sense to add one person.

1 Like

I also think this is a very good effort. If I can be of any help please count me in.

1 Like

Fantastic. I’d love to participate in one of the teams, if possible. Thanks!

1 Like

What a good idea. Please, count me in if possible!!

1 Like

I’m definitely IN! I have some experience finetuning transformers and I want to learn how to pretrain one.

Cheers from Madrid!

1 Like

Hello @versae & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to the BERTIN model. It would be nice if we could discuss some more learning resources that would be useful for this project.I can work in any time zone that is comfortable for everyone in the team.

1 Like

I’m overwhelmed by the good reception of the idea. Thanks y’all! If we are awarded the TPUs, we must decide on a communication channel a bit more effective than the forum, maybe. A Slack channel? Discord? E-mail?

1 Like

It was a really nice idea @versae! If you want we could use the Slack group of NLP en ES :hugs: and create a channel for us. Some of the people in this thread are already in the community, if the others wanna join you can use this invitation.

2 Likes

This seems like a super interesting project. I would love to be a part of this project.

1 Like

This sounds awesome! I would be interested in joining as well, if there’s still room for more people! :slight_smile:

2 Likes

Super exciting about this project with so much interest! Given that we have so many people, I think it makes sense to officially define 2 groups of 7 people (more people can join later :-))

I’ve divided people that are interested into two groups and added all the names here.

Let me know if you don’t want to be in the team and I take you out :slight_smile:

I think the two teams should talk to each other/ help each other as much as possible, but as stated on Slack before having a single team of 14 is not efficient IMO. Therefore, I propose that both teams take their own approach to doing the dataset preprocessing, model definition etc…

One important aspect that should also be discussed a bit is how the pretrained RoBERTa model should be evaluated ? Maybe on XTREME after pretraining?

3 Likes

XTREME and PAWS-X could be a good starting point

1 Like

I think XTREME is perfect. Not sure if XTREME superseeds PAWS-X or they are just different.

So if I understand this correctly, this means we could pre-train 2 versions of BERTIN?

It that’s the case, that’s something the 2 teams need to talk about and agree on. My cents would be to test the 2 different approaches to shrink the corpus size: random sampling, perplexity sampling.

We now have a Discord channel to discuss details: Discord