I faced the same problem. The corresponding error message I get is
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['attention_mask']
Deep diving into the library code of transformers
and torch
I’ve found the following solution:
The function being executed, which you have commented, is the following:
(taken from transformers/trainer_utils.py:686
)
def _remove_columns(self, feature: dict) -> dict:
if not isinstance(feature, dict):
return feature
if not self.message_logged and self.logger and self.model_name:
ignored_columns = list(set(feature.keys()) - set(self.signature_columns))
if len(ignored_columns) > 0:
dset_description = "" if self.description is None else f"in the {self.description} set"
self.logger.info(
f"The following columns {dset_description} don't have a corresponding argument in "
f"`{self.model_name}.forward` and have been ignored: {', '.join(ignored_columns)}."
f" If {', '.join(ignored_columns)} are not expected by `{self.model_name}.forward`, "
" you can safely ignore this message."
)
self.message_logged = True
return {k: v for k, v in feature.items() if k in self.signature_columns}
As the log message states, we run into this problem, because we are trying to train a model, whose forward
function does not use corresponding parameter names. For me this happened because I am trying to train a custom model with a custom forward
function:
def forward(self, tokens, attention_mask): ...
So, the _remove_columns
function removed all entries in the fetched datapoint, that do not correspond to 'tokens'
or 'attention_mask'
, therefore raising the ValueError
mentioned above. Changing the forward
function to
def forward(self, input_ids, attention_mask): ...
has solved this problem for me.