Finetuning GPT for classification task fails

Has anyone tried fintuning GPT for classifications tasks (don’t ask why GPT, I’m testing one hypothesis)?
I took this GPT2 Finetune Classification - George Mihaila as a base and built my binary classifier from GPT Neo 128. Unfortunately, after 5 epoch the F1 metric came from 30% to zero. The logits on the classification head are those that always predict zero as an output. What might go wrong, or GPT just is not suited for classification?