Selection for suitable compute metrics in SFTTrainer for QA

I am finetuning Llama2 for question answering. I have also made a dataset for its training purpose. I have fine-tuned it previously but without a compute_metrics. I am curious what metrics should I take , what are the options available for QA.

PS: I am training for health care , so what metrics seems to be suitable for its evaluation.