I set up a simple test in colab the following code:
from datasets import load_dataset, concatenate_datasets, DatasetDict, logging, disable_progress_bars
%pip show datasets #Confirming i am using 2.18
logging.set_verbosity_warning()
# Load the MNLI and SNLI train datasets
mnli_train = load_dataset("glue", "mnli", split="train")
snli_train = load_dataset("snli", split="train")
# Concatenate the MNLI and SNLI datasets
combined_train = concatenate_datasets([mnli_train, snli_train])
mnli_matched = load_dataset("glue", "mnli_matched")
combined_dataset = DatasetDict({
'train': combined_train, # This is the new combined train split
# Include other splits from mnli_matched if needed, like:
'validation': mnli_matched['validation'],
'test': mnli_matched['test']
})
logging.set_verbosity_info()
disable_progress_bars()
print(combined_dataset)
print("saving")
combined_dataset.save_to_disk("./test", num_proc=2)
With verbosity_info() you get the following error:
Exception in thread Thread-34 (_handle_results):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 579, in _handle_results
task = get()
File "/usr/local/lib/python3.10/dist-packages/multiprocess/connection.py", line 254, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 303, in loads
return load(file, ignore, **kwds)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 289, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 444, in load
obj = StockUnpickler.load(self)
TypeError: CastError.__init__() missing 2 required keyword-only arguments: 'table_column_names' and 'requested_column_names'
(also on a side note I was hoping verbosity_info would show the saving progress. Anyone know how I can see the progress at the moment Progress Bars dont work using Papermill)