Excluding Prompt from Language Model's Response

Hello everyone,

I’m relatively new to working with LLM and I’ve encountered a challenge that I’m hoping to get some help with. I am working on a project that involves analyzing potentially malicious scripts. The process involves reading a file containing malicious code (e.g., a VBS script), submitting this code to a language model for analysis, and then receiving an analysis of the script’s behavior.

Here’s the issue I’m facing: when I submit the script content to the language model, the generated response includes both the requested analysis and the submitted source code of the script. What I want is to only receive the analysis from the model, without having the source code repeated in the response.

I am using the following libraries for my project: ctransformers, huggingface-hub, and langchain. Here’s a snippet of my current code :

malware_path='Malwares/malware.vbs'
with open(malware_path, 'r', encoding='utf-8') as file:
    vbs_content=file.read()

from langchain.llms import CTransformers
from huggingface_hub import hf_hub_download

model_repo = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF"
model_filename = "mistral-7b-instruct-v0.1.Q4_K_M.gguf"

# This downloads the model file to a local path
model_file_path = hf_hub_download(repo_id=model_repo, filename=model_filename)

print(f"Model downloaded to: {model_file_path}")

config = {
    'max_new_tokens': 2000,  
    'temperature': 0.7,     
    'repetition_penalty': 1.1, 
    'context_length':4096, 
    'stream':True
}

llm = CTransformers(model=model_file_path, model_type="mistral", config=config)

if torch.cuda.device_count() > 1:
    llm.model = torch.nn.DataParallel(llm.model)

llm.model.to('cuda')

def split_into_chunks(text, chunk_size=2500): 
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks

# Preparing the prompt
question = "Can you analyse the following malware without including it in your response? Provide a general description of its behavior and list any indicators of compromise (IOC). \n\n"

# Splitting the content into chunks
chunks = split_into_chunks(vbs_content)

responses = []
for chunk in chunks:

    prompt = f"{question}Analyse of the following content (do not include the content in the answer):\n{chunk}"
    
    # Generating the response for each chunk
    response = llm(prompt)  
    responses.append(response)

# Concatenating the responses to get the complete analysis

separator = "\n### CHUNK END ###\n"

final_response = separator.join(responses)
print(final_response)

I have been searching for a way to either configure the model or process its output so that the initial content (i.e., the malware source code) is excluded from the generated response. However, I haven’t been able to find a solution on my own.

Could anyone share advice or suggestions on how to approach this issue? I’d greatly appreciate any guidance, especially since I’m a bit new to this field and might be missing something obvious.

Thank you very much for your help!