Hi there!
I am using huggingface model chavinlo/alpaca-native.
However, when i use local embeddings, my output is always only 1 word long. Can anyone explain this?
model_nm = 'chavinlo/alpaca-native'
save_path = '/content/drive/MyDrive/alpaca_native_pretrained_model_pytorch'
model = LlamaForCausalLM.from_pretrained(save_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(save_path)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=248,
temperature=0.4,
top_p=0.95,
repetition_penalty=1.2
)
local_llm = HuggingFacePipeline(pipeline=pipe)
qa = RetrievalQA.from_chain_type(
llm=local_llm,
chain_type="stuff", # "map_reduce",
retriever=retriever,
return_source_documents=True,
)
query = "xyz"
llm_response = qa(query)
Can anyone help me with that or suggest me alternative ways to embed pdf’s with an LLM, everything locally on colab?
Thanks!
Yves