Help with reusable Q&A instead of pipeline

Im looking for help on improving my understanding of how to do a custom Q&A answer finder without pipelines.

Using “distilbert-base-cased-distilled-squad” , which I understand to be pre-tuned.
Corpus is ~80 sentences (newline delim), naturally written “help” info.

code condensed

Start up the model/tokenizer:

c = AutoConfig.fpt(modelName)
c.num_labels = 2
m = AutoModelForQA.fpt(mn, config=c)
m.to(device)
t = AutoTokenizer.fpt(mn)

Chunk the corpus to for use by the model. Prevent skipping possible answers by windowing.

cs = [] #chunks
t = t.encode(context, add_special_tokens=False)
for i in range(0, len(tokens), cSize - overlapSize):
	s_idx = i
	e_idx = min(i + cSize, len(tokens))
	c = tokens[s_idx: e_idx]
	cs.append(c)

Fetch q&a pair–
I dont like using BatchEncoding to do this nor manually appending special tokens. I want to do this better/right

as = []
q_toks = t.encode(q, add_special_tokens=F)
for c in cs:
	#101=[CLS] to start, 120=[SEP] to split
	i_toks = [[101] + q_toks + [120] + c + [120]]

	i = tokenization_utils_base.BatchEncoding(
		data={'input_ids': torch.tensor(i_toks, device=d), 'attention_mask': torch.tensor([[1 for _ in i_toks[0]]], device=d)},
		encoding=None, tensor_type=PT,
		prepend_batch_axis=F,
		n_sequences=None,
	)
		
	with torch.no_grad():
		o = m(**i)
		s_idx = torch.argmax(o.start_logits)
		end_idx = torch.argmax(o.end_logits)
		a = t.decode(i.input_ids[0,s_idx:e_idx+1])
	
		answers.append({
			"answer": answer,
			"score": float(torch.max(torch.softmax(o.start_logits, dim=1))),
			"log_score": 0,
		})

for a in as:
	a["log_score"] = round(math.log(answer["score"], 4), 5)

sa = sorted(as, onlogscore, reverse=True)

This “works.” However, the answers bubbled to the top from sort are often incorrect, and the “more correct” one is 3rd or 4th or not found.

I feel Im missing a step, filter, or I just don’t understand