I’m trying to fine-tune a model to do sentiment analysis using Keras/TensorFlow. I followed the exact code in Google Colab. However, instead of star-rating, I wanted only sentiment labels, “positive”, “negative”, and “neutral” (1, -1, and 0, respectively). So, during the tokenization, I mapped the star rating to a new “sentiment” field:
def tokenize_function(examples): examples['sentiment'] =  for x in examples['label']: if x > 3: examples['sentiment'].append(1) elif x < 3: examples['sentiment'].append(-1) else: examples['sentiment'].append(0) return tokenizer(examples["text"], padding="max_length", truncation=True) tokenized_datasets = dataset.map(tokenize_function, batched=True)
However, I don’t know where/how to to tell the tokenized_datasets to use the new “sentiment” object for the correct labels. Maybe the DataCollator is used for that? But regardless, I don’t see any documentation on how to do that.