I’m finetuning InCoder-1B on a custom dataset with data that contains [input_ids, attention_mask] as columns. [token_type_ids] was not supported so I removed it.
In terms of training, I’m running 2 GPU’s on data-parallel and a gradient_checkpoint to preserve memory, implemented on PyTorch. This is an issue in training I’m facing which I’m not sure how it came about or which of these aspects it could be related to.
However, the only columns in my data are ['input_ids', 'attention_mask', "token_type_ids"] (after removing ['text'])
Thus, there is no corresponding labels column so I didn’t rename it.
But instead, I tried renaming “token_type_ids” to labels, i.e. tokenized_datasets = tokenized_datasets.rename_column( "token_type_ids", "labels") but incurred an error as well.