I am following along with the Sequence Classification on IMDB Dataset example here:
https://huggingface.co/transformers/master/custom_datasets.html?utm_campaign=Hugging%2BFace&utm_medium=web&utm_source=Hugging_Face_1
However, I am using a custom dataset. Rather than iterate through using read_imdb_split I load my data from a csv, get the value of the sequences and labels, then convert them to a list to pass in to the tokenizer() method. From there I convert them to datasets using the subclassed dataset object as shown in the documentation.
Next, I create TrainingArguments and Trainer as shown and my issue then arises when calling trainer.train(). I receive the following error stack trace:
AttributeError Traceback (most recent call last)
in
----> 1 trainer.train()~\anaconda3\lib\site-packages\transformers\trainer.py in train(self, model_path)
512 self._past = None
513
â 514 for step, inputs in enumerate(epoch_iterator):
515
516 # Skip past any already trained steps if resuming training~\anaconda3\lib\site-packages\tqdm\notebook.py in iter(self, *args, **kwargs)
215 def iter(self, *args, **kwargs):
216 try:
â 217 for obj in super(tqdm_notebook, self).iter(*args, **kwargs):
218 # return super(tqdmâŚ) will not catch exception
219 yield obj~\anaconda3\lib\site-packages\tqdm\std.py in iter(self)
1127
1128 try:
â 1129 for obj in iterable:
1130 yield obj
1131 # Update and possibly print the progressbar.~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in next(self)
343
344 def next(self):
â 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
383 def _next_data(self):
384 index = self._next_index() # may raise StopIteration
â 385 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
386 if self._pin_memory:
387 data = _utils.pin_memory.pin_memory(data)~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in fetch(self, possibly_batched_index)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
â> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in (.0)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
â> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]in getitem(self, idx)
5
6 def getitem(self, idx):
----> 7 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
8 item[âlabelsâ] = torch.tensor(self.labels[idx])
9 return itemAttributeError: âlistâ object has no attribute âitemsâ
I figure this is a result of me loading my data differently and converting them to lists, but I pass the datasets, not lists, to the trainer so I am unclear what is causing the error.