Train GPT2 on wikitext from scratch

@Hamidreza did you find out how long GPT-2 takes to train from scratch?