I am trying to load the 13B model of ALMA by haoranxu from Hugging Face which in itself is probably 55GB huge so takes time to load, which is fine. But post downloading the code feels like is stuck in an infinite loop and does not move forward and does not load anything else.
For reference I am using this code:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
#device = âcuda:0â if torch.cuda.is_available() else âcpuâ
Load base model and LoRA weights
model = AutoModelForCausalLM.from_pretrained(âhaoranxu/ALMA-13Bâ, torch_dtype=torch.float16, device_map=âautoâ)
#model = PeftModel.from_pretrained(model, âhaoranxu/ALMA-13B-Pretrain-LoRAâ)
tokenizer = LlamaTokenizer.from_pretrained(âhaoranxu/ALMA-13B-Pretrainâ, padding_side=âleftâ)
Add the source setence into the prompt template
prompt=âTranslate this from Chinese to English:\nChinese: æç±æșćšçż»èŻă\nEnglish:â
input_ids = tokenizer(prompt, return_tensors=âptâ, padding=True, max_length=40, truncation=True).input_ids.cuda()
Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
Any help will be appreciated, thanks in advance