How to do model.generate() in evaluation steps with Trainer + FSDP?

weiqis · March 30, 2023, 9:59pm

Hi,

I’m training a large GPT2 based causal language model on multiple GPUs using pytorch’s FullyShardedDataParallel (FSDP) strategy. I enabled FSDP in HuggingFace Trainer by passing
the following arguments:

    "fsdp": "full_shard auto_wrap"
    "fsdp_config": {
        "fsdp_transformer_layer_cls_to_wrap": ["GPT2Block"]
    }

With FSDP, the model can be distributed into multiple GPUs with shards and it is successfully trained. Now I want to add an evaluation step to the trainer. I don’t just want to compute the perplexity or accuracy score by getting the argmax of each logit. I want to do an end-to-end evaluation by calling the model’s generate method and generate outputs autoregressively. I couldn’t figure out a way to call model.generate, or equivalent methods in the evaluation step. Below are what I have tried.

To do this custom evaluation, I subclassed the Trainer class:

class CustomTrainer(Trainer):
    def evaluate(
            self,
            eval_dataset = None,
            ignore_keys = None,
            metric_key_prefix: str = "eval",
        ):

        # only take one example for illustration
        input_ids = torch.tensor([self.eval_dataset[0]['input_ids']]).to(f"cuda:{self.args.local_rank}")
        output = self.model.generate(input_ids)
        return {"my_fancy_metric": 1.0}

If I don’t have the .to(f"cuda:{self.args.local_rank}") part, I will get an error message saying:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument index in method wrapper__index_select)

This is understandable, since the input_ids tensor is on cpu and the models are distributed on different GPUs. But after adding .to(f"cuda:{self.args.local_rank}"), I got:

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.

I also tried calling the pipeline("text-generation") with text input but got the same behavior.

So how can I properly call the model.generate method in evaluation steps with Trainer and FSDP?

luyunan · July 24, 2023, 6:20am

Do you have any update on this? I also meet the same problem.

mgerstgrasser · December 2, 2023, 4:43am

I’ve just run into this problem too, in case anyone has figured this out in the mean time? @luyunan @weiqis

uygnef · January 12, 2024, 3:49am

The model is on CPU。try model.cuda()

psr-ai · October 8, 2024, 3:59pm

Have you tried FSDP.summon_full_params? Make sure you use model.eval before so that the model is able to fit in one GPU for inference.

Topic		Replies	Views
How to use FSDP + DPP in Trainer 🤗Transformers	1	999	April 24, 2023
Multi-GPU sharded eval with Trainer and generate method during training DeepSpeed	1	760	May 25, 2023
How to use FSDP or DDP with Seq2SeqTrainer? 🤗Transformers	0	978	May 22, 2023
Multi-GPU eval in PyTorch training loop with generate method 🤗Accelerate	1	2068	August 30, 2022
FSDP with Trainer class: AlgorithmError: ValueError('Cannot flatten integer dtype tensors'), exit code: 1 Intermediate	0	561	June 13, 2024

How to do model.generate() in evaluation steps with Trainer + FSDP?

Related topics