greetings fam
just curious if anyone provide insight on the key error message (KeyError: ‘input_ids’.) i go to train my pretrained BertForMaskedLM model (using code: trainer_BERT.train()) via the huggingface Trainer on my Dataset object. not sure if it has to do with my creation of the dataset or how i am calling my model for training tho any insights are appreciated!!
a detailed view of my code and the key error is available at the link below.
thank you
mick
KeyError Traceback (most recent call last)
in
----> 1 trainer_BERT.train()
2 trainer.save_model(“./models/royalBERT”)
~/anaconda3/lib/python3.7/site-packages/transformers/trainer.py in train(self, model_path, trial)
755 self.control = self.callback_handler.on_epoch_begin(self.args, self.state, self.control)
756
→ 757 for step, inputs in enumerate(epoch_iterator):
758
759 # Skip past any already trained steps if resuming training
~/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in next(self)
361
362 def next(self):
→ 363 data = self._next_data()
364 self._num_yielded += 1
365 if self._dataset_kind == _DatasetKind.Iterable and \
~/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
401 def _next_data(self):
402 index = self._next_index() # may raise StopIteration
→ 403 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
404 if self._pin_memory:
405 data = _utils.pin_memory.pin_memory(data)
~/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
—> 47 return self.collate_fn(data)
~/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py in call(self, examples)
193 ) → Dict[str, torch.Tensor]:
194 if isinstance(examples[0], (dict, BatchEncoding)):
→ 195 examples = [e[“input_ids”] for e in examples]
196 batch = self._tensorize_batch(examples)
197 if self.mlm:
~/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py in (.0)
193 ) → Dict[str, torch.Tensor]:
194 if isinstance(examples[0], (dict, BatchEncoding)):
→ 195 examples = [e[“input_ids”] for e in examples]
196 batch = self._tensorize_batch(examples)
197 if self.mlm:
KeyError: ‘input_ids’