My system runs out of memory on GPU. But I want just to test if it works and want to load it on CPU. How do I do that?
Adding ‘’‘device = torch.device(‘cpu’)’‘’ before loading model doesn’t help
It crashes here:
model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, )