My code is as follows:
batch_size=8
sequence_length=25
vocab_size=100
import tensorflow as tf
from transformers import T5Config, TFT5ForConditionalGeneration
configT5 = T5Config(
vocab_size=vocab_size,
d_ff =512,
)
model = TFT5ForConditionalGeneration(configT5)
model.compile(
optimizer = tf.keras.optimizers.Adam(),
loss = tf.keras.losses.SparseCategoricalCrossentropy()
)
input = tf.random.uniform([batch_size,sequence_length],0,vocab_size,dtype=tf.int32)
labels = tf.random.uniform([batch_size,sequence_length],0,vocab_size,dtype=tf.int32)
input = {'inputs': input, 'decoder_input_ids': input}
model.fit(input, labels)
It generates an error:
logits and labels must have the same first dimension, got logits shape [1600,64] and labels shape [200] [[node sparse_categorical_crossentropy_3/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\FA.PROJECTOR-MSK\Google Диск\Colab Notebooks\PoetryTransformer\experiments\TFT5.py:30) ]] [Op:__inference_train_function_25173] Function call stack: train_function
I dont understand - why the model returns a tensor of [1600, 64]. According to T5 model returns [batch_size, sequence_len, vocab_size].