I’m trying to fine-tune a model to do sentiment analysis using Keras/TensorFlow. I followed the exact code in Google Colab. However, instead of star-rating, I wanted only sentiment labels, “positive”, “negative”, and “neutral” (1, -1, and 0, respectively). So, during the tokenization, I mapped the star rating to a new “sentiment” field:
def tokenize_function(examples):
examples['sentiment'] = []
for x in examples['label']:
if x > 3:
examples['sentiment'].append(1)
elif x < 3:
examples['sentiment'].append(-1)
else:
examples['sentiment'].append(0)
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
However, I don’t know where/how to to tell the tokenized_datasets to use the new “sentiment” object for the correct labels. Maybe the DataCollator is used for that? But regardless, I don’t see any documentation on how to do that.