Hey,
I made a short notebook to show how local data files can be loaded into Datasets and
consequently, be combined with local data files into one Dataset
object.
Check out the google colab here.
Hey,
I made a short notebook to show how local data files can be loaded into Datasets and
consequently, be combined with local data files into one Dataset
object.
Check out the google colab here.
So, if I have a very different dataset, but I have the sentence and the mp3 (doesnt matter the audio quality, silence or if major part of the file is silent/background noise).
I only need to create that json file and use it as a base dataset?
Also it seems that you can save the json files with the correct names for the columns and the rename after load is not needed if you do it that way.
Found how to load from pandas… but now I got while concatenating
ValueError: Datasets should ALL come from memory, or should ALL come from disk.
However datasets [1] come from memory and datasets [0] come from disk.
I had the same issue before, so I just saved the dataset to disk and reload it again like:
dataset.save_to_disk("train_dataset")
dataset = datasets.load_from_disk("train_dataset")
the I can concatenate the two datasets
I’m doing something like this
common_voice_train = common_voice_train.map(lambda x:x,keep_in_memory=True)
common_voice_train = common_voice_train.map(lambda x:x,keep_in_memory=False)
i get the cause my json file containes list inside that list my json data exist now can anyone knows how can i read it in load_dataset?
Try to create first a list
from file paths and only then pass it to load_dataset
@danurahul I think this issue may happen if your JSON couldn’t be read properly by arrow (ArrowInvalid
error). Can you try with one single file to begin with and to be able to debug properly ?
We recently pushed a feature that allows to concatenate any datasets without getting this error !
Currently this feature is only available on master but we’ll do a new release soon
with one single json its working fine but as I increasing the files its giving error
sure i will check that out
One of your files must have format issues or different fields that the other json files.