How to fine tune TFMT5ForConditionalGeneration for text classification?

Hi, I have a problem in fine-tuning TFMT5ForConditionalGeneration for text classification with Tensorflow ‘2.6.0’ and Transformers ‘4.11.2’.

My task is to classify text sentences to one of the severity levels (‘1’, ‘2’, ‘3’, ‘4’, ‘5’).


df = pd.read_csv(FILE, header=0, dtype=str, sep='\t', encoding='utf-8')
X_train, X_eval, y_train, y_eval = train_test_split(list(df.RPT_CNTS), list(df.RECV_EMG_CD), test_size=TEST_SPLIT)

tokenizer = MT5Tokenizer.from_pretrained("google/mt5-small")
train_inputs = tokenizer(X_train, padding='max_length', truncation=True, max_length=100, return_tensors="tf")
train_labels = tokenizer(y_train, padding='max_length', truncation=True, max_length=2)
labels = train_labels.input_ids
labels = [
           [(label if label != 1 else -100) for label in labels_example] for labels_example in labels
]
train_inputs['labels'] = tf.convert_to_tensor(labels, dtype=tf.int32)

train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_inputs),
    tf.convert_to_tensor(labels, dtype=tf.int32)
)).shuffle(10000).batch(128)

class TFT5Classifier(tf.keras.Model):

    def __init__(self, model_name):
        super(TFT5Classifier, self).__init__()
        self.t5 = TFMT5ForConditionalGeneration.from_pretrained(model_name)
        
    def call(self, inputs, attention_mask=None, labels=None, training=False):
        outputs = self.t5(inputs, attention_mask=attention_mask, labels=labels)
        return outputs.logits

model = TFT5Classifier('google/mt5-small')

optimizer = tf.keras.optimizers.Adam(2e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
history = model.fit(train_dataset, epochs=1, batch_size=128)

However, it does not work as follows:


231/231 [==============================] - 131s 568ms/step - loss: nan - accuracy: 0.0000e+00
{'loss': [nan], 'accuracy': [0.0]}

Would you please help me out? Thank you!!

1 Like

I have a follow up question to your loss setup. I noticed that you use SparseCategoricalCrossentropy instead of CategoricalCrossentropy. Why would that be the case?

Also I have a proposed solution but I am unsure of the correctness. 1. Use the model’s loss to train the model. 2. Use the generate function to generate outputs, decode the results to a single word and compute accuracy. Of course this has the problem that generation could be out of vocabulary of the label space and hence it makes me worried about the correctness of the approach.

for batch_idx, (input_ids, attention_masks, label_input_ids, label_attention_masks) in enumerate(train_loader):
                optimizer.zero_grad()
                loss, logits = model(input_ids, attention_masks, label_input_ids, label_attention_masks)
                acc, _ = accuracy(logits, label_input_ids, label_attention_masks, mismatch, complete_mismatch, report=False)
                criterion = nn.CrossEntropyLoss()
                loss.backward()
                optimizer.step()


def accuracy(logits, label_input_ids, label_attention_masks, mismatch, complete_mismatch, report=False):

        generated_ids = model.generator(
                input_ids=label_input_ids,
                attention_mask=label_attention_masks,
                )

        preds= [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
        target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in label_input_ids]
        return accuracy_score(target, preds), [classification_report(target, preds, zero_division=0), mismatch, complete_mismatch, preds]

I would appreciate any corrections about my approach.