How to use trust_remote_code=True with load_checkpoint_and_dispatch?

Hi @SDryluth,

I am able to load the model this way:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, file_utils
from accelerate import init_empty_weights, load_checkpoint_and_dispatch


pretrained_model_dir = 'mosaicml/mpt-7b'
pretrained_model_cache_dir = "/home/user/.cache/huggingface/hub/models--mosaicml--mpt-7b/snapshots/d8304854d4877849c3c0a78f3469512a84419e84/"

config = AutoConfig.from_pretrained(pretrained_model_dir, trust_remote_code=True, torch_dtype=torch.float16)
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True, torch_dtype=torch.float16)

max_memory = {0: "10GiB", "cpu": "80GiB"}
model = load_checkpoint_and_dispatch(
    model, pretrained_model_cache_dir, device_map="auto", max_memory=max_memory, dtype=torch.float16
)

I only have one 12GB VRAM GPU, so I am loading the rest of the model in CPU, but perhaps you can modify your max_memory dict to include 2nd GPU and see if this works.