Problem with Hugging face customised SQuad dataset

Hi,

I have created a custom dataset as csv file on a google sheet and the format is correct but when I uploaded it to hugging face the features answers says they are strings even though I copied exactly as per the original Squad dataset. In the original Squad dataset for the feature answers it says sequence and not strings like mine.

I wonder if anyone have ever came across this issues before ?

Thanks
Aurelie

1 Like

Perhaps this?

Thank you, I’ll have a look. I just uploaded the file to github where I saw the “” aka string elements, so right now I am removing by hand the “” and will download the file again from github as CSV and see if it works , if not I will try the other post .

1 Like

As a last resort, you can specify the data format directly, but it would be better if it could be recognized automatically.

1 Like

Thank you , very good I guess I will use that as I was unable to make the new CSV file working. It’s a bit odd that it’s being recognised as strings and not sequence but maybe that could be google sheets too and wasn’t sure how to make them a sequence in google sheet. Also I used a for loop to get the indexes and manually added them as at the moment i don’t have enough RAM on google colab unless i buy more which I might but for the sake of the small dataset I want to see if i can train it without more RAM. As regards to automatically update the csv file with a for loop , its not possible right now due to lack of RAM. It crashes half way.

1 Like