As an update. I decided to abandon this approach all together and am now implementing using native PyTorch, in which approach I am making extensive use of the HuggingFace library. The approach is a little more long winded but gives greater opportunities for interrogating the source of errors.