How to do model.generate() in evaluation steps with Trainer + FSDP?


I’m training a large GPT2 based causal language model on multiple GPUs using pytorch’s FullyShardedDataParallel (FSDP) strategy. I enabled FSDP in HuggingFace Trainer by passing
the following arguments:

    "fsdp": "full_shard auto_wrap"
    "fsdp_config": {
        "fsdp_transformer_layer_cls_to_wrap": ["GPT2Block"]

With FSDP, the model can be distributed into multiple GPUs with shards and it is successfully trained. Now I want to add an evaluation step to the trainer. I don’t just want to compute the perplexity or accuracy score by getting the argmax of each logit. I want to do an end-to-end evaluation by calling the model’s generate method and generate outputs autoregressively. I couldn’t figure out a way to call model.generate, or equivalent methods in the evaluation step. Below are what I have tried.

To do this custom evaluation, I subclassed the Trainer class:

class CustomTrainer(Trainer):
    def evaluate(
            eval_dataset = None,
            ignore_keys = None,
            metric_key_prefix: str = "eval",

        # only take one example for illustration
        input_ids = torch.tensor([self.eval_dataset[0]['input_ids']]).to(f"cuda:{self.args.local_rank}")
        output = self.model.generate(input_ids)
        return {"my_fancy_metric": 1.0}

If I don’t have the .to(f"cuda:{self.args.local_rank}") part, I will get an error message saying:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument index in method wrapper__index_select)

This is understandable, since the input_ids tensor is on cpu and the models are distributed on different GPUs. But after adding .to(f"cuda:{self.args.local_rank}"), I got:

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.

I also tried calling the pipeline("text-generation") with text input but got the same behavior.

So how can I properly call the model.generate method in evaluation steps with Trainer and FSDP?