Hello!
I am trying to download llama-2 for text generation on google colab free version. I tried simply the following
model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=True)
model = AutoModelForCausalLM.from_pretrained(model_name, token=True)
But this gives me an “ran out of RAM” error and the runtime crashes. I noticed that the GPU RAM wasn’t being used and the CPU RAM was going past the limit and causing the runtime to crash. I saw some potential solutions of trying to checkpoint online – I haven’t done this before so I have to learn how but will learn if that is useful. Are there any ways to successfully get this model running on colab. Additionally, as a more general question – How can I predict how much memory it takes to run a specific model?
Any advice is much appreciated. Thank you!