I’m finetuning InCoder-1B on a custom dataset with data that contains
[input_ids, attention_mask] as columns.
[token_type_ids] was not supported so I removed it.
In terms of training, I’m running 2 GPU’s on data-parallel and a gradient_checkpoint to preserve memory, implemented on PyTorch. This is an issue in training I’m facing which I’m not sure how it came about or which of these aspects it could be related to.
From your message, it looks like your
batch does not contain any label. Therefore your
outputs probably don’t have a real
That is correct! In the fine-tune with PyTorch tutorial it said to postproccess the
tokenized_dataset as follows:
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
However, the only columns in my data are
['input_ids', 'attention_mask', "token_type_ids"] (after removing
Thus, there is no corresponding labels column so I didn’t rename it.
But instead, I tried renaming
“token_type_ids” to labels, i.e.
tokenized_datasets = tokenized_datasets.rename_column( "token_type_ids", "labels") but incurred an error as well.
Any advice on how I should go about this?