Hello,
I am having trouble getting meaningful dialog going with LLMs - is there something I am doing wrong below? Thanks so much for your generosity in helping out. I am trying the simplest code:
my_model = “meta-llama/Llama-2-7b-chat-hf”
#my_model = “google/flan-t5-xl”
llm = HuggingFaceHub(repo_id = my_model, model_kwargs={“temperature”:0.05, “max_length”:1024})
text = “Tell me about the seven Harry Potter novels in detail.”
print(llm(text))
It attempts an answer (I did get a pro subscription on huggingface so I can now use Llama models) “The seven Harry Potter novels are a series of seven fantasy novels written by J”. That is it, it stops mid-sentence at J. I tried running on my laptop without a GPU, and tried running on google colab choosing ‘T4 GPU’.
flan-t5-xl times out. flan-t5-base completes one sentence, “Harry Potter and the Philosopher’s Stone is a series of seven books written by Harry Potter and the Philosopher’s Stone.”
There is all this code with custom pdfs and vectorstores that I am playing with, but I am stymied by these curt answers. Thank you for any help and pointers. I am open to reasonable paid solutions, please share which models and infrastructure choices you have found to be cost-effective for user-friendly conversations.
Thanks again.
tbh I don’t know why it’s not working with u but anw I will share with u the code that I’ve used and it did work for me: (I tried this code on sagemaker notebook instance with 16GB GPU RAM)
model_id = 'meta-llama/Llama-2-7b-chat-hf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = transformers.AutoModelForCausalLM.from_pretrained(
model_id,
load_in_8bit=True,
device_map='auto',
torch_dtype=torch.float16)
model.eval()
hg_pipeline = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=True,
task='text-generation',
temperature=0.1,
max_new_tokens=512,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id
)
prompt = "What's the difference between AWS and GCP?"
output = hg_pipeline(prompt)
print(output[0]["generated_text"])
Thank you, Khalil. What did you use exactly for instance_type?
With your code, I am running into a new issue -
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json
This is strange, as I could access the same model before with my code. Looks like others have also encountered this error before, but I haven’t found a way to resolve it yet. Will keep trying to get your code to work for me to see what I get, thanks.
Hello Sonali, I was using ml.g4dn.xlarge as an instance type.
regarding the Unauthorized for url, not sure tbh, but did u login to your huggingface account? in order to get Llama 2 weight you have to submit a form here: Llama 2 - Meta AI then you have to login to Huggingface using the same email you used in Meta’s form, they will send u an email saying they approved your request. After that in your terminal login to huggingface account using this command: huggingface-cli login
they will ask you for a security token, which you can get from your account in the settings.
hope that helps!