Accelerate use of memory

I am trying to compile alia-40b model, that needs accelerate and transformers.
Boths try to use all memory to load models, but my computer did not have so much memory as these library needs by the model cited above ( 60 GB ram )
So , i am looking on internet an accelerate version that do a offload to disk, but i cant find it.
I understand it would be possible ( so far today, dont know tomorrow if i know for sure that it is no possible ), so, i am trying to change lines from mode.to(device) to another one like

from accelerate import disk_offload
offload_folder = “/ruta/a/tu/directorio/de/offload”
model = disk_offload(model, offload_folder=offload_folder)

but after one line, gives an error to another.
Anyone had hear some thing about a project chaging accelerate to accomodote to less memory?
Thanks

1 Like

Normally, device_map=“auto” is sufficient. Using .to() will try to move the entire model to RAM or VRAM, so there is not much point in using accelerate.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "BSC-LT/ALIA-40b"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load the model
model = AutoModelForCausalLM.from_pretrained(
  model_id,
  device_map="auto",
  torch_dtype=torch.bfloat16
)