I came across a TypeError: Couldn’t cast array of type struct<…> to null when using Dataset.from_generator(generator_func, gen_kwargs={‘file_path’: ‘…’}).
Samples in my file_path are json per line. There is one key of the json, ‘lines’, which is a list. Some samples have several dicts in this list while others only have empty list. And elements in the list are just what the error message points to(have the struct).
So at first, I thought this might be caused by some wrong samples, then I chunked my data file(about 170+ thousand lines) into several small files(about 50 thousand lines). Still, each file will have samples having empty list and dict list, then I used Dataset.from_generator on each of them. Strangely, the error disappeared.
Could anyone answer this? Is it related with memory or something?