GPT2.generate() with custom inputs_embeds argument returning tensor (1*max_length) instead of (batch_size*max_length)

Hi everybody!

I tried to let my GPT2 model generate sequences from custom input sequences. The model does output sequences but when I use tensors of eg shape (3, 314, 1280) it only outputs a long tensor of shape(1, max_length).

CODE for reproducing:

MOD = AutoModelForCausalLM.from_pretrained(filepath_GPT2)

Out = MOD.generate(input_embeds=torch.rand(3,314,1280), max_length=40, temperature=1.0, repetition_penalty=1.2, top_k=950, num_return_sequences=1, do_sample=True, top_p=1.0)

torch.Size([1, 40])

The model according to model.generate documentation should output:

        :obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
        The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
        shorter if all batches finished early due to the :obj:`eos_token_id`.

However this does not seem to work.
Already checked src code for inputs_embeds… there should be no issue there.

Thanks for the help in advance!