Hi, at training, I’m using the forward pass and batch_decode on the logits to get the decoded output:
outputs = model(
input_ids,
attention_mask,
dec_input_ids,
dec_attention_mask,
labels=dec_input_ids,
)
loss, logits = outputs.loss, outputs.logits
decoded_output = tokenizer.batch_decode(torch.argmax(outputs.logits, dim=2).tolist(), skip_special_tokens=True)
And decoded_output seems to comply with what I trained the model on:
bread dough ; side surface
However, I’ve noticed that using model.generate produces non-sense:
generated = model.generate(input_ids)
tokenizer.decode(generated[0], skip_special_tokens=True))
table table table table table table table table table table table table table table table table table table
Note that it is the same model instance, as well as the same input_ids (this way it can’t be related to saving/loading issues, and I guess it also eliminates the possibility of encoding/tokenization issues for input_ids).
Background: model is of class T5ForConditionalGeneration and initialized with t5-small.
What’s the problem here? I’ve used the EncoderDecoderModel in the very same way, and there, model.generate works as expected.