Fine tuning a LLaMa 3 with QLora - metrics calculation

Hi everybody!

I am trying to finetune a llama3-8B with peft and TRL. It seems I got everything to run correctly (or, at least the loss is decreasing). I am now trying to calculate metrics for my training. To do this I need to decode the output of the LLM in a compute_metrics custom function. However, I am not sure on how to do this. I have verified that I obtain a numpy array with a prediction for each validation example (seems to be 49xvocab_size). If I try to argmax->decode however, the output is gibberish and I think I am missing something fundamental here.
I have seen that in some libraries (LLamaFactory) SFT with Lora is done with a Seq2Seq trainer: would this be the correct way to go?
Thanks!