Using alpaca with local embedding

Hi there!

I am using huggingface model chavinlo/alpaca-native.
However, when i use local embeddings, my output is always only 1 word long. Can anyone explain this?

model_nm = 'chavinlo/alpaca-native' 
save_path = '/content/drive/MyDrive/alpaca_native_pretrained_model_pytorch'

model = LlamaForCausalLM.from_pretrained(save_path, return_dict=True, load_in_8bit=True, device_map='auto')

tokenizer = AutoTokenizer.from_pretrained(save_path)

pipe = pipeline(
    "text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=248,
    temperature=0.4,
    top_p=0.95,
    repetition_penalty=1.2
)

local_llm = HuggingFacePipeline(pipeline=pipe)

qa = RetrievalQA.from_chain_type(
    llm=local_llm,
    chain_type="stuff",  # "map_reduce",
    retriever=retriever,
    return_source_documents=True,
)

query = "xyz"
llm_response = qa(query)

Can anyone help me with that or suggest me alternative ways to embed pdf’s with an LLM, everything locally on colab?

Thanks!
Yves

The max_length you’ve specified is 248. That might not be enough to include the context from the RetrievalQA embeddings, plus your question, and so the response returned is small because the prompt is exceeding the context window. Try increasing max_length to start with.