Using GRPOTrainer with a custom PyTorch module?

fixed. turns out i just need to pad back the output with the original prompt lol

1 Like