How can I use tokenized Dataset for Text Generation?

My goal is to use a dataset I tokenized as input to a model, let it batch generate the output text and then decode it to get outputs as strings. I have written some code but this throws an AttributeError when I provide the dataset as input and I can’t seem to find how do this right. I saw that you can just provide a tokenized list instead as input. Is there no way to do this with datasets directly or am I missing something?

Here is what I do:

I have a dataset that was initialized from a dataframe and has the following outline:

    features: ['question', 'answer'],
    num_rows: 500

where question and answer are both strings.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from datasets import Dataset

checkpoint = "bigscience/mt0-xxl"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

tokenized_ds = tokenize(tokenizer, dataset)
outputs = model.generate(tokenized_ds)

The tokenize function is defined as follows:

def tokenize(tok, ds):
    def tokenize_fn(sample):
        result = tok(sample['question'], padding=True, return_tensors='pt')
        return result

    tokenized =
        tokenize_fn, batched=True, remove_columns=['question', 'answer']

    return tokenized

I get the following error:
“AttributeError: ‘Dataset’ object has no attribute ‘dtype’”

How can I use my dataset as input for my model?