Use alpaca with local embedding

Hi there!

I am using huggingface model chavinlo/alpaca-native.
However, when i use local embeddings, my output is always only 1 word long. Can anyone explain this?

model_nm = 'chavinlo/alpaca-native' 
save_path = '/content/drive/MyDrive/alpaca_native_pretrained_model_pytorch'

model = LlamaForCausalLM.from_pretrained(save_path, return_dict=True, load_in_8bit=True, device_map='auto')

tokenizer = AutoTokenizer.from_pretrained(save_path)

pipe = pipeline(
    "text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=248,
    temperature=0.4,
    top_p=0.95,
    repetition_penalty=1.2
)

local_llm = HuggingFacePipeline(pipeline=pipe)

qa = RetrievalQA.from_chain_type(
    llm=local_llm,
    chain_type="stuff",  # "map_reduce",
    retriever=retriever,
    return_source_documents=True,
)

query = "xyz"
llm_response = qa(query)

Can anyone help me with that or suggest me alternative ways to embed pdf’s with an LLM, everything locally on colab?

Thanks!
Yves