Prefetch factor issue

jeevisha30 · February 22, 2024, 3:18pm

Hey folks,

I am training a BERT model and it was working fine till 2 days back but I am suddenly getting this error-

Detected kernel version 4.14.336, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

ValueError Traceback (most recent call last)
Cell In[22], line 23
1 training_args = TrainingArguments(
2 output_dir=“output”,
3 learning_rate=2e-5,
(…)
10 load_best_model_at_end=True
11 )
13 trainer = Trainer(
14 model=model,
15 args=training_args,
(…)
20 compute_metrics=compute_metrics
21 )
—> 23 trainer.train()

File /opt/conda/lib/python3.8/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1622 hf_hub_utils.enable_progress_bars()
1623 else:
→ 1624 return inner_training_loop(
1625 args=args,
1626 resume_from_checkpoint=resume_from_checkpoint,
1627 trial=trial,
1628 ignore_keys_for_eval=ignore_keys_for_eval,
1629 )

File /opt/conda/lib/python3.8/site-packages/transformers/trainer.py:1653, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1651 logger.debug(f"Currently training with a batch size of: {self._train_batch_size}")
1652 # Data loader and number of training steps
→ 1653 train_dataloader = self.get_train_dataloader()
1654 if self.is_fsdp_xla_v2_enabled:
1655 train_dataloader = tpu_spmd_dataloader(train_dataloader)

File /opt/conda/lib/python3.8/site-packages/transformers/trainer.py:852, in Trainer.get_train_dataloader(self)
849 dataloader_params[“worker_init_fn”] = seed_worker
850 dataloader_params[“prefetch_factor”] = self.args.dataloader_prefetch_factor
→ 852 return self.accelerator.prepare(DataLoader(train_dataset, **dataloader_params))

File /opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py:183, in DataLoader.init(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers)
180 raise ValueError(‘timeout option should be non-negative’)
182 if num_workers == 0 and prefetch_factor != 2:
→ 183 raise ValueError(‘prefetch_factor option could only be specified in multiprocessing.’
184 ‘let num_workers > 0 to enable multiprocessing.’)
185 assert prefetch_factor > 0
187 if persistent_workers and num_workers == 0:

ValueError: prefetch_factor option could only be specified in multiprocessing.let num_workers > 0 to enable multiprocessing.

Can someone help me with this?

I am installing the below packages-

! pip install langdetect
! pip install transformers[torch]
! pip install accelerate -U
! pip install transformers
! pip install datasets
! pip install seqeval
! pip install evaluate
! conda install -n base -c conda-forge -y ipywidgets
! pip install transformers --upgrade
%pip install ‘snowflake-connector-python[pandas]’
!pip install keyring==23.10.0

deeparsh · February 28, 2024, 5:55am

please check with dependencies in virtual environment and restart the system these are the conflicts

mutahar-safdar · March 4, 2024, 5:34pm

@jeevisha30 I am facing the same issue, were you able to find the solution? In my case this is happening with the BEiT: BERT Pre-Training of Image Transformers model.

ringohoffman · March 4, 2024, 11:44pm

What version of torch are you using? <=1.13.1?

I am seeing a regression with 4.38.* and torch<=1.13.1.

dataloader_prefetch_factor was added to TrainingArguments 2 months ago with the default value None: Blaming transformers/src/transformers/training_args.py at e9476832942a19cf99354776ef112babc83c139a · huggingface/transformers · GitHub

But old versions of torch do not accept None and will raise an error if num_workers == 0 and prefetch_factor != 2: pytorch/torch/utils/data/dataloader.py at 49444c3e546bf240bed24a101e747422d1f8a0ee · pytorch/pytorch · GitHub

ringohoffman · March 5, 2024, 1:45am

github.com/huggingface/transformers

Fix TrainingArguments regression with torch <2.0.0 for dataloader_prefetch_factor

huggingface:main ← ringohoffman:add-torch-1.X-backwards-compatibility-prefetch_factor

opened 01:45AM - 05 Mar 24 UTC

ringohoffman

+4 -9

# What does this PR do? `dataloader_prefetch_factor` was added to `TrainingAr…guments` in #28498 with the default value `None`, but versions of `torch`<2.0.0 do not accept `None` and will raise an error if `num_workers == 0 and prefetch_factor != 2` Fixes https://discuss.huggingface.co/t/prefetch-factor-issue/74367/4 Also remove [a superfluous check](https://github.com/huggingface/transformers/blame/e9476832942a19cf99354776ef112babc83c139a/src/transformers/training_args.py#L1810-L1814) that duplicates [the <2.0.0 check](https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/data/dataloader.py#L240-L242) and [the >=2.0.0 check](https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/dataloader.py#L244-L246). ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @muellerz @amyeroberts

mutahar-safdar · March 5, 2024, 10:06am

I am using torch version 1.13.1+cu116

Topic		Replies	Views
Trainer API object detection 🤗Transformers	2	43	December 29, 2024
Trainer with load_best_model_at_end doesn't work 🤗Transformers	6	3862	July 28, 2022
Program hangs when creating a transformers.TrainingArguments object 🤗Transformers	2	429	April 23, 2024
Trainer API (wandb) error Beginners	0	51	November 24, 2024
# Audio course Unit 4. sample code not working. Can anyone check for me? Thanks Course	0	24	December 20, 2024

Prefetch factor issue

Detected kernel version 4.14.336, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

Related topics