Hello,
I didnât see these errors earlier when I ran seq2seq distillation last year, however the below script run from transformers/examples/research_projects/seq2seq-distillation
gives me a couple of issues.
python distillation.py \
--teacher google/t5-large-ssm-nq --data_dir $NQOPEN_DIR \
--tokenizer_name t5-large \
--student_decoder_layers 6 --student_encoder_layers 6 \
--freeze_encoder --freeze_embeds \
--learning_rate=3e-4 \
--do_train \
--gpus 4 \
--do_predict \
--fp16 --fp16_opt_level=O1 \
--val_check_interval 0.1 --n_val 500 --eval_beams 2 --length_penalty=0.5 \
--max_target_length=60 --val_max_target_length=60 --test_max_target_length=100 \
--model_name_or_path IGNORED \
--alpha_hid=3. \
--train_batch_size=2 --eval_batch_size=2 --gradient_accumulation_steps=2 \
--sortish_sampler \
--num_train_epochs=6 \
--warmup_steps 500 \
--output_dir distilled_t5_sft \
--logger_name wandb \
"$@"
Issues:
- I get the following warning at the beginning:
Epoch 0: 0%| | 2/12396 [00:00<1:20:46, 2.56it/s, loss=nan, v_num=xyme]/home/sumithrab/miniconda3/envs/distill/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
- Wandb does not show learning curves. I get the following warning:
Epoch 0: 1%| | 99/12396 [00:27<56:02, 3.66it/s, loss=5.55e+04, v_num=xyme]wandb: WARNING Step must only increase in log calls. Step 98 < 100; dropping {'loss': 56946.98828125, 'ce_loss': 24.933889389038086, 'mlm_loss': 9.203145980834961, 'hid_loss_enc': 951.351318359375, 'hid_loss_dec': 18023.71484375, 'tpb': 42, 'bs': 2, 'src_pad_tok': 2, 'src_pad_frac': 0.05882352963089943}.
Any ideas you may have would be very helpful.