Small NLP model to train on a 3090?

Hi Folks,

tl;dr: can anyone recommend a smallish NLP model I can realistically train on my 3090 in a few hours to a day? This needs to be full training from scratch (NOT finetuning) - so the data to train it needs to be on HF :slight_smile:

I’ve been playing with the ADAM optimizer and tweaking it to build more lateral-thinking networks but my tests so far on simple CNN’s have not really yielded any good results. But thats what I’d expect. I really need a fairly large number of neurons to see any effect. I also really want a network with attention as I’d like to do some experiments on this too.

Any recommendations?