Evaluate question answering with squad dataset

Hello everybody

I want to build question answering system by fine tuning bert using squad1.1 or squad2.0
i would like to ask about evaluating question answering system, i know there is squad and squad_v2 metrics, how can we use them when fine-tune bert with pytorch?
thank you

This example should hopefully answer your question.

If the purpose is to have a good question answering model, you could also use one of the many pretrained models on the hugging face model hub. Models - Hugging Face

thanks for your response