How to train TFT5ForConditionalGeneration model?

brand17 · August 27, 2020, 9:18am

My code is as follows:

batch_size=8
sequence_length=25
vocab_size=100
import tensorflow as tf
from transformers import T5Config, TFT5ForConditionalGeneration
configT5 = T5Config(
    vocab_size=vocab_size,
    d_ff =512, 
)  
model = TFT5ForConditionalGeneration(configT5)

model.compile(
    optimizer = tf.keras.optimizers.Adam(),
    loss = tf.keras.losses.SparseCategoricalCrossentropy()
)
input = tf.random.uniform([batch_size,sequence_length],0,vocab_size,dtype=tf.int32)
labels = tf.random.uniform([batch_size,sequence_length],0,vocab_size,dtype=tf.int32)
input = {'inputs': input, 'decoder_input_ids': input}
model.fit(input, labels)

It generates an error:

logits and labels must have the same first dimension, got logits shape [1600,64] and labels shape [200] [[node sparse_categorical_crossentropy_3/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\FA.PROJECTOR-MSK\Google Диск\Colab Notebooks\PoetryTransformer\experiments\TFT5.py:30) ]] [Op:__inference_train_function_25173] Function call stack: train_function

I dont understand - why the model returns a tensor of [1600, 64]. According to T5 model returns [batch_size, sequence_len, vocab_size].

valhalla · August 28, 2020, 3:32pm

Pinging @patrickvonplaten

patrickvonplaten · September 1, 2020, 10:26am

Thanks for posting this! I’ll open an issue about it on work on it asap.
Issue: https://github.com/huggingface/transformers/issues/6876

congcongwang · September 6, 2020, 12:02pm

Using high-level keras to fine-tune T5 seems tricky due to limited flexibility and code controls. I wrote something on fine-tuning T5 with customized training loop in TF2.0: https://github.com/wangcongcong123/ttt. Hope this helps.

patrickvonplaten · October 1, 2020, 3:56pm

This should help https://github.com/huggingface/transformers/pull/7428

brand17 · November 21, 2020, 10:19am

Thanks a lot. Am I right that I have to override train_step() to make TFT5ForConditionalGeneration working ?

Topic		Replies	Views
How to get T5 decoded logits using TFT5ForConditionalGeneration from encoded outputs? 🤗Transformers	1	553	March 19, 2023
Finetuning T5 on custom data Models	0	1057	November 13, 2020
How to fine tune TFMT5ForConditionalGeneration for text classification? Beginners	1	461	June 6, 2023
Fine-tuning T5 on Tensorflow Beginners	4	2789	November 29, 2021
T5Model predict <UNK> Beginners	0	222	October 5, 2022

How to train TFT5ForConditionalGeneration model?

Related topics