Hi,
this line is the problem:
X_data = Dataset(X_text_tokenized)
because X_text_tokenized
is a list of dictionaries and not a dictionary of lists. You can fix this with the following code:
def list_of_dicts_to_dict_of_lists(d):
dic = d[0]
keys = dic.keys()
values = [dic.values() for dic in d]
return {k: list(v) for k, v in zip(keys, zip(*values))}
X_data = Dataset(list_of_dicts_to_dict_of_lists(X_text_tokenized))