Finetuning GPT2 with user defined loss

@aclifton314 yup, still the case that the gradients won’t flow through the sampling line. Check out this this post