Unable to load ALMA-13B model from HF

I am trying to load the 13B model of ALMA by haoranxu from Hugging Face which in itself is probably 55GB huge so takes time to load, which is fine. But post downloading the code feels like is stuck in an infinite loop and does not move forward and does not load anything else.

For reference I am using this code:

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
#device = “cuda:0” if torch.cuda.is_available() else “cpu”

Load base model and LoRA weights

model = AutoModelForCausalLM.from_pretrained(“haoranxu/ALMA-13B”, torch_dtype=torch.float16, device_map=“auto”)
#model = PeftModel.from_pretrained(model, “haoranxu/ALMA-13B-Pretrain-LoRA”)
tokenizer = LlamaTokenizer.from_pretrained(“haoranxu/ALMA-13B-Pretrain”, padding_side=‘left’)

Add the source setence into the prompt template

prompt=“Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:”
input_ids = tokenizer(prompt, return_tensors=“pt”, padding=True, max_length=40, truncation=True).input_ids.cuda()


with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Any help will be appreciated, thanks in advance