Correct Format for Translation Dataset To fine tune pretrained Models

I’m trying to fine tune Helsinki model l on my own dataset which is a two column CSV file, I turned it into a dictionary that looks like this:
{‘id’: [0,1,2,…], ‘translation’:{‘en’:‘some text’,‘ar’:‘نص’}}
I used the function mentioned in the tutorial:

source_lang = ‘en’
target_lang = ‘ar’

def preprocess_function(examples):

inputs = [example[source_lang] for example in examples["translation"]]

targets = [example[target_lang] for example in examples["translation"]]

model_inputs = tokenizer(inputs, text_target=targets, max_length=128, truncation=True)

return model_inputs

Then tokenized it and followed the tutorial all the way till the training part but I was getting this error:

‘Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers’

It’s a problem with the dataset format right? What’s the right format, and how can I process it?