I am trying to build a customized trainer evaluation loop using trainer library, however, I encounter some problem because my evaluation metric depends on string comparasion instead of logit loss. However, it seems that that model(input_ids) followed by finding max probability for each position gives a very different output from model.generate(). But model.generate() cannot be triggered since it’s wrapped with deepspeed during evaluation.
May I ask is it possible to replicate model.generate() behavior using a wrapped model? Or is there any solution that I missed out.