Finetuning ALBERT with custom unlabeled dataset for next sentence prediction task


I’m trying to figure out how to make my dataset compatible with ALBERT for next sentence prediction task. How should I generate the next sentence logits? Are there any examples? I have around 1 million paragraphs with around 300 words each, and my dataset is completely unlabelled (but is domain specific).

1 Like