@Maimonator Can you tell me how to set the parameter of shared_ddp
?
I have tried to use deepspeed. But I occurred error.
That error is the same as the following issue (RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one...
).
Can you tell me your json of deepspeed config?
opened 04:08AM - 24 Mar 21 UTC
closed 03:02PM - 01 May 21 UTC
## Environment info
<!-- You can run the command `transformers-cli env` and cop… y-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version: 4.5.0.dev0 (I tried running it on 4.4.0 as well, gave the same error)
- Platform: Ubuntu (running on a virtual machine)
- Python version: 3.8
- PyTorch version (GPU?): 1.6.0
- Using GPU in script?: yes, running [this script](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py)
- Using distributed or parallel set-up in script?: Distributed
### Who can help
@patrickvonplaten (as per the message on slack group)
<!-- Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
Models:
- albert, bert, xlm: @LysandreJik
- blenderbot, bart, marian, pegasus, encoderdecoder, t5: @patrickvonplaten, @patil-suraj
- longformer, reformer, transfoxl, xlnet: @patrickvonplaten
- fsmt: @stas00
- funnel: @sgugger
- gpt2: @patrickvonplaten, @LysandreJik
- rag: @patrickvonplaten, @lhoestq
- tensorflow: @LysandreJik
Library:
- benchmarks: @patrickvonplaten
- deepspeed: @stas00
- ray/raytune: @richardliaw, @amogkam
- text generation: @patrickvonplaten
- tokenizers: @LysandreJik
- trainer: @sgugger
- pipelines: @LysandreJik
Documentation: @sgugger
Model hub:
- for issues with a model report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- nlp datasets: [different repo](https://github.com/huggingface/nlp)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples:
- maintained examples (not research project or legacy): @sgugger, @patil-suraj
- research_projects/bert-loses-patience: @JetRunner
- research_projects/distillation: @VictorSanh
-->
## Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
- [ ] the official example scripts: (give details below)
- [ ] my own modified scripts: (give details below)
Tried running both official command and modified script (running command changed based on the language)
The tasks I am working on is
- [ ] common voice dataset (ta)
## To reproduce
Steps to reproduce the behavior:
1. run common voice script [from here](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py)
2. For multi-gpu setup I used this command `python -m torch.distributed.launch \
--nproc_per_node 4 run_common_voice.py \
--model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
--dataset_config_name="tr" \ # use this argument to specify the language code
--output_dir=./wav2vec2-large-xlsr-turkish-demo \
--overwrite_output_dir \
--num_train_epochs="5" \
--per_device_train_batch_size="16" \
--learning_rate="3e-4" \
--warmup_steps="500" \
--evaluation_strategy="steps" \
--save_steps="400" \
--eval_steps="400" \
--logging_steps="400" \
--save_total_limit="3" \
--freeze_feature_extractor \
--feat_proj_dropout="0.0" \
--layerdrop="0.1" \
--gradient_checkpointing \
--fp16 \
--group_by_length \
--do_train --do_eval `
## Error:
`RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument 'find_unused_parameters=True' to 'torch.nn.parallel.DistributedDataParallel'; (2) making sure all 'forward' function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's 'forward' function. Please include the loss function and the structure of the return value of 'forward' of your module when reporting this issue (e.g. list, dict, iterable).`
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
Model would train without any error