Noob asking code review and advice: langchain and translation with towerinstruct and Python

Complete noob in AI, deep learning, machine learning, everything with “intelligent something”.

I would love some advice to start understanding how it works, and understand my mistakes.

I started to write code for a very simple task:

  • I have a text file in Spanish (but Spanish is not important), and there is no necessarily a relationship between the lines - meaning by now I do not need to handle the context (maybe later!)

  • I read it line by line

  • I write a prompt asking to translate it for a towerinstruct model

  • Then I print the result.

To be honest, the behavior of the machine seems very strange to me. At first it works (first lines), but after few lines it starts to write text by himself as such as "The translation you entered is as follows: " , “Translation in English” or "Spanish: ". I tried to add some system prompt, without significant success.

Here is my dumb code. Any comment will be so helpful to me!

import sys
import os
import re
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import LlamaCpp

MODEL="/home/dani/AI-models/towerinstruct-7b-v0.1.Q8_0.gguf"

TEMPLATE = """
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""

PROMPT = PromptTemplate(
	input_variables=["prompt", "system_message"],
	template=TEMPLATE,
)
SYSTEM_MESSAGE = ""
CALLBACK_MANAGER = CallbackManager([StreamingStdOutCallbackHandler()])
LLM = LlamaCpp(
	model_path=MODEL,
	temperature=0.5,
	max_tokens=500,
	top_p=1,
	callback_manager=CALLBACK_MANAGER,
	verbose=False,
)

def prompt_tr(txt, in_lang='Spanish', out_lang='English'):
	return "Translate the following text from {lang1} into {lang2}.\n{lang1}: {prompt}\n{lang2}:".format(
		lang1=in_lang,
		lang2=out_lang,
		prompt=txt
	)

def translate_sp_en(txt):
	text = prompt_tr(txt)
	#print(PROMPT.format(prompt=text, system_message=SYSTEM_MESSAGE))
	output = LLM.invoke(PROMPT.format(prompt=text, system_message=SYSTEM_MESSAGE))
	print(output)

def usage():
	print("Usage: {} @filepath".format(sys.argv[0]))

if __name__ == '__main__':
	if len(sys.argv) < 2:
		usage()
		sys.exit(1)
	if not os.path.isfile(sys.argv[1]):
		print("Wrong path '{}'".format(sys.argv[1]))
		usage()
		sys.exit(2)
	with open(sys.argv[1],'r') as f:
		for line in f:
			translate_sp_en(line.rstrip())