Using alpaca with local embedding

rmcf1902 · May 10, 2023, 3:47pm

Hi there!

I am using huggingface model chavinlo/alpaca-native.
However, when i use local embeddings, my output is always only 1 word long. Can anyone explain this?

model_nm = 'chavinlo/alpaca-native' 
save_path = '/content/drive/MyDrive/alpaca_native_pretrained_model_pytorch'

model = LlamaForCausalLM.from_pretrained(save_path, return_dict=True, load_in_8bit=True, device_map='auto')

tokenizer = AutoTokenizer.from_pretrained(save_path)

pipe = pipeline(
    "text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=248,
    temperature=0.4,
    top_p=0.95,
    repetition_penalty=1.2
)

local_llm = HuggingFacePipeline(pipeline=pipe)

qa = RetrievalQA.from_chain_type(
    llm=local_llm,
    chain_type="stuff",  # "map_reduce",
    retriever=retriever,
    return_source_documents=True,
)

query = "xyz"
llm_response = qa(query)

Can anyone help me with that or suggest me alternative ways to embed pdf’s with an LLM, everything locally on colab?

Thanks!
Yves

sarmadq · July 19, 2023, 12:58am

The max_length you’ve specified is 248. That might not be enough to include the context from the RetrievalQA embeddings, plus your question, and so the response returned is small because the prompt is exceeding the context window. Try increasing max_length to start with.

Topic		Replies	Views
Use alpaca with local embedding Beginners	0	630	May 11, 2023
Accelerating inference for local HuggingFacePipeline of Llama3 🤗Transformers	0	88	August 1, 2024
Loading a retrained model locally Beginners	2	2432	February 5, 2024
Why does local-downloaded model files are different from those in huggingface? Beginners	5	719	June 10, 2024
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1495	June 23, 2024

Using alpaca with local embedding

Related topics