My system runs out of memory on GPU. But I want just to test if it works and want to load it on CPU. How do I do that?
Adding ‘’‘device = torch.device(‘cpu’)’‘’ before loading model doesn’t help
It crashes here:
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)