Loading BloomForCausalLM from sharded checkpoints

Gillner · February 21, 2023, 4:24pm

Hey everyone,

I am currently working on my master thesis and have used the Transformers library succesfully for most of the experiments I wanted to conduct. The only thing I am stuck with is loading a sharded version of Bloom-7b1, which I am loading using the accelerate load_checkpoint_and_dispatch following the guide Big model inference, which results in the following error.

File “load_bloom_test.py”, line 22, in
model = load_checkpoint_and_dispatch(
File “/opt/conda/lib/python3.8/site-packages/accelerate/big_modeling.py”, line 375, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File “/opt/conda/lib/python3.8/site-packages/accelerate/utils/modeling.py”, line 699, in load_checkpoint_in_model
set_module_tensor_to_device(model, param_name, param_device, value=param)
File “/opt/conda/lib/python3.8/site-packages/accelerate/utils/modeling.py”, line 105, in set_module_tensor_to_device
new_module = getattr(module, split)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1260, in getattr
raise AttributeError(“‘{}’ object has no attribute ‘{}’”.format(
AttributeError: ‘BloomForCausalLM’ object has no attribute ‘word_embeddings’

The code I am using works fine for other models, such as Galactica (for all variants, including galactica-120b) and looks like this:

import torch

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch

# Loading model from config on meta device and using sharded checkpoints
model_name = "models/bloom-7b1"
config = AutoConfig.from_pretrained(model_name)
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

# Calculating device map
device_map = infer_auto_device_map(
    model, 
    no_split_module_classes=["BloomBlock"],
    dtype=torch.float16
)

# Loading the checkpoint according to auto device_map
model = load_checkpoint_and_dispatch(
    model,
    model_name,
    device_map=device_map
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer("Hello, my name is", return_tensors="pt")
inputs = inputs.to(0)
output = model.generate(inputs["input_ids"])
print(tokenizer.decode(output[0]))

I have redownloaded the model a couple of times and also tried using bigscience/bloom-7b1 to download it directly from the hub during execution of the script.

I would like this approach to work, as it has worked fine for all other models and I have not managed to get a distributed setup working using a different approach when using a model that doesn’t fit on a single GPU anymore.

Any help with either using a different loading approach or a solution to my problem would be greatly appreciated

sh0416 · March 7, 2023, 3:14pm

I also get and error, but slightly different.

Traceback (most recent call last):
  File "run_inference.py", line 12, in <module>
    model = load_checkpoint_and_dispatch(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/big_modeling.py", line 427, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 748, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: word_embeddings.weight doesn't have any device set.

sh0416 · March 7, 2023, 3:19pm

The same code for gpt-j-6B works.

Reference: Handling big models for inference

@sgugger Could you check it for us?

Also, OPT raises an error.

Traceback (most recent call last):
  File "run_inference.py", line 13, in <module>
    model = load_checkpoint_and_dispatch(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/big_modeling.py", line 427, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/anaconda3/envs/research/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 748, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: decoder.embed_tokens.weight doesn't have any device set.

The code I used it for testing.

from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from transformers import AutoConfig, AutoModelForCausalLM

# checkpoint = "bigscience/bloom-7b1"
# checkpoint = "EleutherAI/gpt-j-6B"
checkpoint = "facebook/opt-6.7b"
config = AutoConfig.from_pretrained(checkpoint)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)


model = load_checkpoint_and_dispatch(
    model,
    # "bloom-7b1",
    # "sharded-gpt-j-6B",
    "opt-6.7b",
    device_map="sequential",
    # no_split_module_classes=["BloomBlock"],
    # no_split_module_classes=["GPTJBlock"],
    no_split_module_classes=["OPTDecoderLayer"],
)

print(model.hf_device_map)

I tried using different device_map, but it didn’t work…

sgugger · March 7, 2023, 5:15pm

You won’t be able to use the checkpoint on the Hub directly in Accelerate, you need to pass via Transformer, since the model on the Hub is the checkpoint of the base model, so not the model with the LM head.

sh0416 · March 8, 2023, 1:50am

Could I ask what is the “Transformer” you mentioned? Does it indicate the “nn.Transformer” in PyTorch native or is there any other “Transformer” class in the accelerate? I am new to this library.

sgugger · March 8, 2023, 2:05am

Transformers the library you are using to load the model. AutoModelForCausalLM.from_pretrained accepts device_map="sequential".

sh0416 · March 8, 2023, 4:29pm

It works using AutoModelForCausalLM.from_pretrained("facebook/opt-6.7b", device_map="auto").

Thank you It really helps.

Also, I think that the inferred device map doesn’t need no module split.

It already handles depending on the architecture. Is it right?

sgugger · March 8, 2023, 4:39pm

Yes Transformers puts that info for you

Topic		Replies	Views
General question about large model loading 🤗Accelerate	2	960	November 28, 2024
Sharded checkpoints 🤗Accelerate	3	6541	August 31, 2022
How to load a checkpoint model with SHARDED_STATE_DICT? 🤗Accelerate	5	2060	January 11, 2024
Load_checkpoint_and_dispatch without heavy system memory usage 🤗Accelerate	1	3125	April 10, 2023
Using loaded model with accelerate for inference 🤗Accelerate	3	2012	November 4, 2022

Loading BloomForCausalLM from sharded checkpoints

Related topics