Applying an evaluation metric for causal LM model


I am trying to fine tune a pre-trained falcon-7b model and my task is related to questions and answering (QA) with that model.

My question is whether we can use ROUGE metric to the CausalLM trainer directly such as if we do in summarization task using Seq2Seq model? If yes, what are steps that I need to do?

Any lead would be really appreciated!