Hi, at training, I’m using the forward pass and batch_decode
on the logits to get the decoded output:
outputs = model(
input_ids,
attention_mask,
dec_input_ids,
dec_attention_mask,
labels=dec_input_ids,
)
loss, logits = outputs.loss, outputs.logits
decoded_output = tokenizer.batch_decode(torch.argmax(outputs.logits, dim=2).tolist(), skip_special_tokens=True)
And decoded_output
seems to comply with what I trained the model on:
bread dough ; side surface
However, I’ve noticed that using model.generate
produces non-sense:
generated = model.generate(input_ids)
tokenizer.decode(generated[0], skip_special_tokens=True))
table table table table table table table table table table table table table table table table table table
Note that it is the same model
instance, as well as the same input_ids
(this way it can’t be related to saving/loading issues, and I guess it also eliminates the possibility of encoding/tokenization issues for input_ids
).
Background: model is of class T5ForConditionalGeneration
and initialized with t5-small
.
What’s the problem here? I’ve used the EncoderDecoderModel
in the very same way, and there, model.generate
works as expected.