Loading BloomForCausalLM from sharded checkpoints

Hey everyone,

I am currently working on my master thesis and have used the Transformers library succesfully for most of the experiments I wanted to conduct. The only thing I am stuck with is loading a sharded version of Bloom-7b1, which I am loading using the accelerate load_checkpoint_and_dispatch following the guide Big model inference, which results in the following error.

File “load_bloom_test.py”, line 22, in
model = load_checkpoint_and_dispatch(
File “/opt/conda/lib/python3.8/site-packages/accelerate/big_modeling.py”, line 375, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File “/opt/conda/lib/python3.8/site-packages/accelerate/utils/modeling.py”, line 699, in load_checkpoint_in_model
set_module_tensor_to_device(model, param_name, param_device, value=param)
File “/opt/conda/lib/python3.8/site-packages/accelerate/utils/modeling.py”, line 105, in set_module_tensor_to_device
new_module = getattr(module, split)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1260, in getattr
raise AttributeError(“‘{}’ object has no attribute ‘{}’”.format(
AttributeError: ‘BloomForCausalLM’ object has no attribute ‘word_embeddings’

The code I am using works fine for other models, such as Galactica (for all variants, including galactica-120b) and looks like this:

import torch

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch

# Loading model from config on meta device and using sharded checkpoints
model_name = "models/bloom-7b1"
config = AutoConfig.from_pretrained(model_name)
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

# Calculating device map
device_map = infer_auto_device_map(
    model, 
    no_split_module_classes=["BloomBlock"],
    dtype=torch.float16
)

# Loading the checkpoint according to auto device_map
model = load_checkpoint_and_dispatch(
    model,
    model_name,
    device_map=device_map
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer("Hello, my name is", return_tensors="pt")
inputs = inputs.to(0)
output = model.generate(inputs["input_ids"])
print(tokenizer.decode(output[0]))

I have redownloaded the model a couple of times and also tried using bigscience/bloom-7b1 to download it directly from the hub during execution of the script.

I would like this approach to work, as it has worked fine for all other models and I have not managed to get a distributed setup working using a different approach when using a model that doesn’t fit on a single GPU anymore.

Any help with either using a different loading approach or a solution to my problem would be greatly appreciated

I also get and error, but slightly different.

Traceback (most recent call last):
  File "run_inference.py", line 12, in <module>
    model = load_checkpoint_and_dispatch(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/big_modeling.py", line 427, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 748, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: word_embeddings.weight doesn't have any device set.

The same code for gpt-j-6B works.

Reference: Handling big models for inference

@sgugger Could you check it for us?

Also, OPT raises an error.

Traceback (most recent call last):
  File "run_inference.py", line 13, in <module>
    model = load_checkpoint_and_dispatch(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/big_modeling.py", line 427, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 748, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: decoder.embed_tokens.weight doesn't have any device set.

The code I used it for testing.

from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from transformers import AutoConfig, AutoModelForCausalLM

# checkpoint = "bigscience/bloom-7b1"
# checkpoint = "EleutherAI/gpt-j-6B"
checkpoint = "facebook/opt-6.7b"
config = AutoConfig.from_pretrained(checkpoint)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)


model = load_checkpoint_and_dispatch(
    model,
    # "bloom-7b1",
    # "sharded-gpt-j-6B",
    "opt-6.7b",
    device_map="sequential",
    # no_split_module_classes=["BloomBlock"],
    # no_split_module_classes=["GPTJBlock"],
    no_split_module_classes=["OPTDecoderLayer"],
)

print(model.hf_device_map)

I tried using different device_map, but it didn’t work…

You won’t be able to use the checkpoint on the Hub directly in Accelerate, you need to pass via Transformer, since the model on the Hub is the checkpoint of the base model, so not the model with the LM head.

Could I ask what is the “Transformer” you mentioned? Does it indicate the “nn.Transformer” in PyTorch native or is there any other “Transformer” class in the accelerate? I am new to this library.

Transformers the library you are using to load the model. AutoModelForCausalLM.from_pretrained accepts device_map="sequential".

1 Like

It works using AutoModelForCausalLM.from_pretrained("facebook/opt-6.7b", device_map="auto").

Thank you :slight_smile: It really helps.

Also, I think that the inferred device map doesn’t need no module split.

It already handles depending on the architecture. Is it right?

Yes Transformers puts that info for you :slight_smile:

1 Like