Error when following Transformers Language modeling tutorial step by step

I am following this tutorial and copy-pasting all the code blocks into Colab. However when it gets to the training segment, this error pops up.

/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 8466
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 3177
 [ 254/3177 00:47 < 09:10, 5.31 it/s, Epoch 0.24/3]
Epoch	Training Loss	Validation Loss
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
    718                 if not is_tensor(value):
--> 719                     tensor = as_tensor(value)
    720 

ValueError: expected sequence of length 128 at dim 1 (got 127)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
10 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
    734                     )
    735                 raise ValueError(
--> 736                     "Unable to create tensor, you should probably activate truncation and/or padding with"
    737                     " 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your"
    738                     f" features (`{key}` in this case) have excessive nesting (inputs type `list` where type `int` is"

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`labels` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

I have already initialized the data_collator/ran all the code blocks in the tutorial but am still receiving an error about truncation?

Ok, apparently distilgpt2 doesn’t support dynamic padding through the data_collator (or padding in general? not too sure). So I opted to drop the remainders.

block_size = 128

def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
        # customize this part to your needs.
    total_length = (total_length // block_size) * block_size
    # Split by chunks of max_len.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result