How to fine tune TFMT5ForConditionalGeneration for text classification?

soonil · October 4, 2021, 11:50am

Hi, I have a problem in fine-tuning TFMT5ForConditionalGeneration for text classification with Tensorflow ‘2.6.0’ and Transformers ‘4.11.2’.

My task is to classify text sentences to one of the severity levels (‘1’, ‘2’, ‘3’, ‘4’, ‘5’).


df = pd.read_csv(FILE, header=0, dtype=str, sep='\t', encoding='utf-8')
X_train, X_eval, y_train, y_eval = train_test_split(list(df.RPT_CNTS), list(df.RECV_EMG_CD), test_size=TEST_SPLIT)

tokenizer = MT5Tokenizer.from_pretrained("google/mt5-small")
train_inputs = tokenizer(X_train, padding='max_length', truncation=True, max_length=100, return_tensors="tf")
train_labels = tokenizer(y_train, padding='max_length', truncation=True, max_length=2)
labels = train_labels.input_ids
labels = [
           [(label if label != 1 else -100) for label in labels_example] for labels_example in labels
]
train_inputs['labels'] = tf.convert_to_tensor(labels, dtype=tf.int32)

train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_inputs),
    tf.convert_to_tensor(labels, dtype=tf.int32)
)).shuffle(10000).batch(128)

class TFT5Classifier(tf.keras.Model):

    def __init__(self, model_name):
        super(TFT5Classifier, self).__init__()
        self.t5 = TFMT5ForConditionalGeneration.from_pretrained(model_name)
        
    def call(self, inputs, attention_mask=None, labels=None, training=False):
        outputs = self.t5(inputs, attention_mask=attention_mask, labels=labels)
        return outputs.logits

model = TFT5Classifier('google/mt5-small')

optimizer = tf.keras.optimizers.Adam(2e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
history = model.fit(train_dataset, epochs=1, batch_size=128)

However, it does not work as follows:


231/231 [==============================] - 131s 568ms/step - loss: nan - accuracy: 0.0000e+00
{'loss': [nan], 'accuracy': [0.0]}

Would you please help me out? Thank you!!

Kaveri · June 6, 2023, 3:15pm

I have a follow up question to your loss setup. I noticed that you use SparseCategoricalCrossentropy instead of CategoricalCrossentropy. Why would that be the case?

Also I have a proposed solution but I am unsure of the correctness. 1. Use the model’s loss to train the model. 2. Use the generate function to generate outputs, decode the results to a single word and compute accuracy. Of course this has the problem that generation could be out of vocabulary of the label space and hence it makes me worried about the correctness of the approach.

for batch_idx, (input_ids, attention_masks, label_input_ids, label_attention_masks) in enumerate(train_loader):
                optimizer.zero_grad()
                loss, logits = model(input_ids, attention_masks, label_input_ids, label_attention_masks)
                acc, _ = accuracy(logits, label_input_ids, label_attention_masks, mismatch, complete_mismatch, report=False)
                criterion = nn.CrossEntropyLoss()
                loss.backward()
                optimizer.step()


def accuracy(logits, label_input_ids, label_attention_masks, mismatch, complete_mismatch, report=False):

        generated_ids = model.generator(
                input_ids=label_input_ids,
                attention_mask=label_attention_masks,
                )

        preds= [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
        target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in label_input_ids]
        return accuracy_score(target, preds), [classification_report(target, preds, zero_division=0), mismatch, complete_mismatch, preds]

I would appreciate any corrections about my approach.

Topic		Replies	Views
Finetuning T5 on custom data Models	0	1057	November 13, 2020
How to train TFT5ForConditionalGeneration model? 🤗Transformers	5	3329	November 21, 2020
Problem generating with T5ForConditionalGeneration on a custom task 🤗Transformers	2	40	January 26, 2025
Fine tune Transformers for text generation 🤗Transformers	11	11984	July 27, 2023
Can t5 be used to text-generation? Beginners	7	8799	April 26, 2023

How to fine tune TFMT5ForConditionalGeneration for text classification?

Related topics