I thought today I will give huggingface a try because it seems to be the easiest option to run LLAMA-2. So I followed this example:
This file has been truncated.
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-70b-chat-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-70b-chat-agent.ipynb)\n",
"# LLaMa 70B Chatbot in Hugging Face and LangChain\n",
"In this notebook we'll explore how we can use the open source **Llama-70b-chat** model in both Hugging Face transformers and LangChain.\n",
"At the time of writing, you must first request access to Llama 2 models via [this form]() (access is typically granted within a few hours).\n",
"🚨 _Note that running this on CPU is practically impossible. It will take a very long time. If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > A100**. Using this notebook requires ~38GB of GPU RAM._\n",
And pasted in my access token. However I always get a HTTP 403 error. What could be wrong?
After googling some more I figured out what the problem is: After getting authorized by Meta you also need to get authorization by HF which can be requested by pressing the button here:
meta-llama/Llama-2-7b-chat-hf · Hugging Face
So much drama with the llama!