Hi everyone, I’m very new to all this and working with this kind of data, and I don’t know if the data we’ve been using for our own model can be converted to a Hugging Face dataset based on how it’s structured.
If you run these two Python lines:
df = pd.DataFrame({'a': [1,2,3,4,'v',6,7]})
dataset = Dataset.from_pandas(df)
you get an error because the values in the dataframe column are not all the same type: there is one string or char mixed in with all the ints.
I’m working with data composed of tables from spreadsheets, where each column is represented as a JSON object of the form {"Name": <string>, "Values": <array of string and numbers>}
, where Name
is the header of the table column (in one cell) and Values
represents the column of cells below it. The whole table is stored as an array of these.
In the dataset I’m trying to create, this table would be a feature, part of a larger set of features (along with other things such as the ID and a descriptive string). But can I create a feature for this, when I have to include a type, even for things like Array2D
, when one of these objects can hold both strings and numbers in its Values
list?
Even more complicated, there is another feature we use, which can be either a number, a string, one of these tables (array of those column objects), or an array of tables (so an array of arrays of column objects). Is it possible to have that be a feature when it has so many possibilities for its type? A union type would be really nice, but I don’t think it’s possible, or at least I couldn’t find it in the documentation.
Thank you!