I also switched to fairseq, there are several LR and threshold management things missing in HF I think.