How do you tokenize one long string?

Let’s say my training dataset is just one super long string. What is the correct way to tokenize this?

I have this so far:

trainenc = tokenizer(train_dataset['text'], return_tensors='pt', max_length=128, truncation=True, padding=True, return_overflowing_tokens=True)

What arguments should i keep? Afterwards, how do I split up my long list of tokens into batches where each element of the batch is short enough to fit inside the model?

Thanks