Since the Phi-3-mini was too heavy-weight and slow, I went back to the existing consciousAI/question-answering-generative-t5-v1-base-s-q-c · Hugging Face model, which supports generative QA but only for short sequences (max 512).
Simply splitting my full text into these short sequences and getting answers for each of them works quite well.
Is there any way of weighting/prioritizing/scoring these answers? Ie, can the model output some sort of certainty/quality score for the generated answer?
If so, I could use that to filter out the best answers.