Implement few-shot inference for question-answering with DistilBERT

I am trying in-context-learning/ prompt-engineering with the DistilBERT model (using Python transformers pipeline) to answer a question from a given report (in reality, the amount is 8000+ reports). The report is a detailed description on contaminated locations, encompassing hazardous substances, affected area, etc. Now I want to leverage DistilBERT to retrieve the answer to the following question:

What specific toxic substances, chemical compounds and pollution, hazardous waste, etc. were identified in the report as being present at the contamination site?

I started with a zero-shot inference to see how it would perform:

from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import torch
import pandas as pd


#%% define model
model_name = 'distilbert-base-cased-distilled-squad'

#%% initialize with pipeline
question_answerer = pipeline("question-answering", model= model_name, tokenizer =model_name)

#%% question
q_substance = 'What specific toxic substances, chemical compounds and pollution, hazardous waste, etc. were identified in the report as being present at the contamination site?'

#%% report (context) shortened
report = '''
Demolition debris, petroleum contaminated soil and scrap metal were disposed of in a permitted landfill during cleanup work conducted by the Engineers in the early 1990s.  Landfill B is located approximately 1/2 mile south of the west end of the main runway, on the opposite side of the airport from the former Radio Relay Site (RRS), and was used to dispose of Formerly Used Defense Site (FUDS) wastes, mainly related to World War II debris associated with the old Fort [name].  POL-impacted soil. with less than 5,000 mg/kg TPH, and with PCB concentrations less than 10 mg/kg were placed in 6-mch lifts within the landfill cap. The landfill was seeded after the cap was in place.
'''

#%% zero-shot inference
zero_shot_input = [{'context': report, 
 'question': q_substance
 }]

#%% result
output0 = question_answerer(zero_shot_input[0]['question'], zero_shot_input[0]['context'])
#
print(zero_shot_input[0]['question'])
print(output0)

## which returns the following : 
What specific toxic substances, chemical compounds and pollution, hazardous waste, etc. were identified in the report as being present at the contamination site?
{'score': 0.5506489872932434, 'start': 1, 'end': 63, 'answer': 'Demolition debris, petroleum contaminated soil and scrap metal'}

That is not bad for the beginning, but as you can see, even from the excerpt of the report, there are several contaminants missing: POL, TPH, and PCB. It is thus likely and expected that it won’t perform well for longer reports.

Thus, I want to give the model a few examples (context and answer) within a few-shot inference. Please note that I shortened the context input ([…]) for readability:

few_shot_input = [
    {
        "context": report,
        "question": q_substance,
        "examples": [
            {"context": "[...] DRO was reported ranging from 94.6 to 295 mg/kg in the stockpile.",
            "question": q_substance, 
            "answer": "DRO (diesel range organics)"},
            { "context": "[...] eighty cubic yards of stockpiled petroleum-contaminated soil to an offsite portable treatment facility operated by Cascade Environmental.",
             "question": q_substance,
             "answer": "petroleum"},
            { "context": "[...] due to presence of lead, reported at 1,250 mg/kg.",
             "question": q_substance,
             "answer": "lead"},
            {"context": "Report 4: [...] Report documented DRO, lead, benzo(a)anthracene, benzo(a)pyrene, ideno(1,2,3-c,d)pyrene, benzo(b)fluoranthene, PCB, and dibenzo(a,h,i)anthracene in groundwater at concentrations exceeding Table C groundwater cleanup levels (DRO at up to 338 mg/L and lead at up to 2.97 mg/L).",
            "question": q_substance,
            "answer": "DRO (diesel range organics), lead, benzo(a)anthracene, benzo(a)pyrene, ideno(1,2,3-c,d)pyrene, benzo(b)fluoranthene, PCB (Polychlorinated Biphenyls), and dibenzo(a,h,i)anthracene"}
            # Add more example-context-question-answer sets as needed
        ]
    },
]

With this I hoped to improve the answer of the model.

However, I can not wrap my mind around how to pass these examples to the model for it to actually use it.

So far I tried concatenating the examples one-by-one to give the model a gradually improving (and growing) context:

# Perform few-shot inference
for example in few_shot_input:
    main_context = example["context"]
    main_question = example["question"]
    examples = example["examples"]
    
    # print("Main context:", main_context)
    # print("Main question:", main_question)
    count = 1
    
    for ex in examples:
        context = ex["context"]
        question = ex["question"]
        answer = ex["answer"]
        
        # Combine main context and example context
        
        if count == 1:
            prompt_mask = """
            Context: %s
            
            Question: %s
            
            Example context %s:
            %s
            Example answer: %s
            """ %(main_context, main_question, count, context, answer)
        else:
            prompt_mask += """
            Example context %s:
            %s
            Example answer: %s
            """ %(count, context, answer)
            
        # Perform question-answering on the combined context
        output = question_answerer(question, prompt_mask)
        print('------------------------------------> ' + str(count) + '\n' )

        print(prompt_mask)
        print("Model answer:", output["answer"])


        count += 1

This did not yield the result, I was hoping - in the contrary - it gave me the answer to the respective examples.

Is there a better way to structure my input to “let it communicate” more efficiently with the model or is it just not feasible without fine-tuning?