I have tried using autotrain inside huggingface and in google colab but both time i just get the ValueError ERROR | 2023-12-22 05:01:02 | autotrain.trainers.common:wrapper:90 - train has failed due to an exception: Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/autotrain/trainers/common.py”, line 87, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/main.py”, line 446, in train
trainer.train()
File “/usr/local/lib/python3.10/dist-packages/transformers/trainer.py”, line 1537, in train
return inner_training_loop(
File “/usr/local/lib/python3.10/dist-packages/transformers/trainer.py”, line 1553, in _inner_training_loop
train_dataloader = self.get_train_dataloader()
File “/usr/local/lib/python3.10/dist-packages/transformers/trainer.py”, line 800, in get_train_dataloader
dataloader_params[“sampler”] = self._get_train_sampler()
File “/usr/local/lib/python3.10/dist-packages/transformers/trainer.py”, line 770, in _get_train_sampler
return RandomSampler(self.train_dataset)
File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py”, line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
ERROR | 2023-12-22 05:01:02 | autotrain.trainers.common:wrapper:91 - num_samples should be a positive integer value, but got num_samples=0
i have uploaded the Dataset into huggingface
Maxx0/DatasetProfy
i think something is wrong with it but i don’t know what
🚀 INFO | 2023-12-26 04:11:22 | __main__:process_input_data:41 - loading dataset from disk
🚀 INFO | 2023-12-26 04:11:22 | __main__:process_input_data:82 - Train data: Dataset({
features: ['autotrain_text', '__index_level_0__'],
num_rows: 268
})
🚀 INFO | 2023-12-26 04:11:22 | __main__:process_input_data:83 - Valid data: None
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:12<00:00, 4.19s/it]
🚀 INFO | 2023-12-26 04:11:36 | __main__:train:271 - Using block size 1024
Running tokenizer on train dataset: 100%|█| 268/268 [00:00<00:00, 41504.76 examp
Grouping texts in chunks of 1024 (num_proc=4): 100%|█| 268/268 [00:01<00:00, 198
🚀 INFO | 2023-12-26 04:11:41 | __main__:train:333 - creating trainer
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
❌ ERROR | 2023-12-26 04:11:47 | autotrain.trainers.common:wrapper:90 - train has failed due to an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/common.py", line 87, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/__main__.py", line 469, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
train_dataloader = self.get_train_dataloader()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 800, in get_train_dataloader
dataloader_params["sampler"] = self._get_train_sampler()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 770, in _get_train_sampler
return RandomSampler(self.train_dataset)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 141, in __init__
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
❌ ERROR | 2023-12-26 04:11:47 | autotrain.trainers.common:wrapper:91 - num_samples should be a positive integer value, but got num_samples=0
Makes no sense because it should default to the length of the entire dataset, in my case 268:
def num_samples(self) -> int:
# dataset size might change at runtime
if self._num_samples is None:
return len(self.data_source)
return self._num_samples
Well, this just doesn’t work plain and simple. On HF there’s a " TypeError: object of type ‘NoneType’ has no len()" exception which is unclear why that’s happening. When running on RunPod with a reduced block size of 4 it still generates a CUDA out of memory error. What’s the point of this when you can’t train a 4G 7B model with 268 lines of training data on a H100 with 80G of RAM and 4bit quant? Obviously better documentation is needed from those with the knowledge. Hang on let me dig through the hundreds of posts…