Extractive Q&A - HF pipeline top_k returns same span as different answers

I’m using the HF pipeline for extractive Q&A with top_k=K (K is some integer greater than 1). I measure the goodness of answers with the perplexity score. I noticed that sometimes I get the same span on the as different answers. (Example 1st answer with score 0.35 and the 3rd answer with score 0.11)
This is corrupting my perplexity scores which I use downstream to select the best question to extract what I need from the data.
Has anyone come across this and if so, how to solve this (other than manually grouping the results)?