About training data pre-processing


I have a dataset with different lengths for each piece of data, and when I Tokenizer it, the lengths of each piece are different. When I feed the tokenized data into the GPT-2 model for training, an error occurs.

Do I have the same length of data for the final GPT-2 model to be trained?