Hi @mazerte, sorry for the delay in replying! This is one of those cases where I’d actually recommend trying our new “internal loss” method. For more complex models like Seq2Seq, getting the right Keras losses can be hard - it’s possible, but it can require a lot of knowledge and some hacky code. Instead, just let our model compute loss for you! To do that you should do two things:
- Move the labels to the input dictionary so that they’re visible to the model on the forward pass, like so:
tf_train = inputs.to_tf_dataset(
columns=["attention_mask", "input_ids", 'decoder_input_ids', 'labels'],
shuffle=True,
collate_fn=data_collator,
batch_size=batch_size,
)
- Remove the
loss
argument tocompile()
. Note that right now, we don’t support Keras metrics when using the internal loss, but this is an area of very active development - that will hopefully change soon!
model.compile(
optimizer=optimizer
)
If you make these two changes, your model should train successfully. We recommend this method whenever you’re not sure of which loss to use.