What if I have more than one reference when doing generation finetune task

My test set and validation set have 3 reference created by human, how can I eval my model during training?