I can not download the llama2 model with transformers on gcp

I am using vertex ai and workbench and a notebook RAG_llama2_vertexai.ipynb to use llama2 models using transformers:

pip install -U git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git

then I authenticate HF:

from huggingface_hub import notebook_login

# Login to Huggingface to get access to the model
notebook_login()

then

import os
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

and when I try to get the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")from transformers 

I get

File Save Error for RAG_llama2_vertexai.ipynb

Invalid response: 524

and the kernel crashes…

I tried to spin a vm with more memory and it crashes at the same point. Any idea what is going on? or what else I could try? Many thanks!