Creating QA Data Set for Distilbert

I am trying to create a custom QA dataset to use to fine tune Distilbert. However, I can’t get past a type error. I’ve compared my dataset to the squad dataset format and the only thing I can see is that my answer text is wrapped in quotes which are not in the original csv. They appear after I load the dataset. How do I create a QA data set in the csv format so I can use it for fine tuning?

I load the data using from datasets import load_dataset ds = load_dataset('csv', data_files='path/to/local/my_dataset.csv')

Error that I am getting when I run:

tokenized_ds =, batched=True, remove_columns=ds["train"].column_names)```
Cell In[41], line 19, in preprocess_function(examples)
     17 for i, offset in enumerate(offset_mapping):
     18     answer = answers[i]
---> 19     start_char = answer["answer_start"][0]
     20     end_char = answer["answer_start"][0] + len(answer["text"][0])
     21     sequence_ids = inputs.sequence_ids(i)

TypeError: string indices must be integers

Figured this out. When you build a tabular data set in Excel and export to csv it will wrap some text in double quotes for parsing reasons. Not a good thing when it comes to importing data.