Multiclass Classification: "labels" format

LoadedFries · October 26, 2022, 3:09pm

Hello Community,

I fine-tuned a BERTforSequenceClassification with the Huggingface Trainer
i have a question regarding the format of the “labels” in my dataset. I read through a lot of postings and threads, but i am still a bit confused.

I have a dataset with 5 labels and it is a multiclass classification problem. So a text has exact one value of (0,1,2,3,4). I read everywhere that the format has to be a tensor with 0’s and 1’s. However, i finetuned a BERT model with the Huggingface Trainer and let the labels have the values from 0 to 4.

My Question:

does the trainer autoformat these labels or does it accept normal integers aswell? I dont get any error, can predict values without problems. The results seem reliable aswell.

I hope you can help me with this, because i didnt find a reliable source anywhere that confirms this.
Thank you very much

Topic		Replies	Views
Data format for BertForSequenceClassification with num_labels > 2 Beginners	5	4918	August 2, 2021
Dataset for multilabel classification 🤗Transformers	1	178	January 20, 2025
Dataset label format for multi-label text classification 🤗Datasets	9	13297	February 9, 2023
BertForSequenceClassification - ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 35]))) Intermediate	0	621	July 11, 2023
Logits and labels must have the same shape ((512, 6) vs (6, 1)) - MultiClass Classification with BERT Beginners	0	1446	September 3, 2021

Multiclass Classification: "labels" format

Related topics