How to do inference with fined-tuned huggingface models?

I have fine-tuned a Huggingface model using the IMDB dataset, and I was able to use the trainer to make predictions on the test set by doing trainer.predict(test_ds_encoded). However, when doing the same thing with the inference set that has a dummy label feature (all -1s instead of 0s and 1s), the trainer threw an error:

/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
/usr/local/src/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_23/4156768683.py in <module>
----> 1 trainer.predict(inference_ds_encoded)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in predict(self, test_dataset, ignore_keys, metric_key_prefix)
  2694         eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
  2695         output = eval_loop(
-> 2696             test_dataloader, description="Prediction", ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix
  2697         )
  2698         total_batch_size = self.args.eval_batch_size * self.args.world_size

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
  2819                 )
  2820             if logits is not None:
-> 2821                 logits = self._pad_across_processes(logits)
  2822                 logits = self._nested_gather(logits)
  2823                 if self.preprocess_logits_for_metrics is not None:

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _pad_across_processes(self, tensor, pad_index)
  2953             return tensor
  2954         # Gather all sizes
-> 2955         size = torch.tensor(tensor.shape, device=tensor.device)[None]
  2956         sizes = self._nested_gather(size).cpu()
  2957 

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I then removed the label feature: trainer.predict(inference_ds_encoded.remove_columns('label')), but still got an error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_23/899960315.py in <module>
----> 1 trainer.predict(inference_ds_encoded.remove_columns('label'))

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in predict(self, test_dataset, ignore_keys, metric_key_prefix)
   2694         eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
   2695         output = eval_loop(
-> 2696             test_dataloader, description="Prediction", ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix
   2697         )
   2698         total_batch_size = self.args.eval_batch_size * self.args.world_size

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   2796 
   2797             # Prediction step
-> 2798             loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
   2799             inputs_decode = inputs["input_ids"] if args.include_inputs_for_metrics else None
   2800 

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in prediction_step(self, model, inputs, prediction_loss_only, ignore_keys)
   2999         """
   3000         has_labels = all(inputs.get(k) is not None for k in self.label_names)
-> 3001         inputs = self._prepare_inputs(inputs)
   3002         if ignore_keys is None:
   3003             if hasattr(self.model, "config"):

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _prepare_inputs(self, inputs)
   2261         handling potential state.
   2262         """
-> 2263         inputs = self._prepare_input(inputs)
   2264         if len(inputs) == 0:
   2265             raise ValueError(

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _prepare_input(self, data)
   2243         """
   2244         if isinstance(data, Mapping):
-> 2245             return type(data)({k: self._prepare_input(v) for k, v in data.items()})
   2246         elif isinstance(data, (tuple, list)):
   2247             return type(data)(self._prepare_input(v) for v in data)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in <dictcomp>(.0)
   2243         """
   2244         if isinstance(data, Mapping):
-> 2245             return type(data)({k: self._prepare_input(v) for k, v in data.items()})
   2246         elif isinstance(data, (tuple, list)):
   2247             return type(data)(self._prepare_input(v) for v in data)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _prepare_input(self, data)
   2253                 # may need special handling to match the dtypes of the model
   2254                 kwargs.update(dict(dtype=self.args.hf_deepspeed_config.dtype()))
-> 2255             return data.to(**kwargs)
   2256         return data
   2257 

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I also tried using the trained model object to make predictions, and I got a different error:

text = ["I like the film it's really exciting!", "I hate the movie, it's so boring!!"]
encoding = tokenizer(text)
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)

Error:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_23/94414684.py in <module>
      1 text = ["I like the film it's really exciting!", "I hate the movie, it's so boring!!"]
      2 encoding = tokenizer(text)
----> 3 outputs = model(**encoding)
      4 predictions = outputs.logits.argmax(-1)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
    752             output_attentions=output_attentions,
    753             output_hidden_states=output_hidden_states,
--> 754             return_dict=return_dict,
    755         )
    756         hidden_state = distilbert_output[0]  # (bs, seq_len, dim)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    549             raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    550         elif input_ids is not None:
--> 551             input_shape = input_ids.size()
    552         elif inputs_embeds is not None:
    553             input_shape = inputs_embeds.size()[:-1]

AttributeError: 'list' object has no attribute 'size'

My code can be found on Kaggle here: imdb_text_classification_with_transformers | Kaggle. Thank you!

Bumping up my post…Thank you in advance for any help!

Can someone please help? Thanks

Hey there.
Did you get any solution for this issue? I am facing a similar issue with Glue/ Cola dataset, when trying to execute the “notebooks/examples/text_classification.ipynb at main · huggingface/notebooks · GitHub” notebook.