Hello,
I tried to run a script for translation using t5-large model but it’s auto-cancelling like if someone pressed ctrl+c and there is no traceback.
Here is the script
! python ./transformers/examples/pytorch/translation/run_translation.py
--model_name_or_path t5-large
--do_train
--do_eval
--source_lang fr
--target_lang en
--source_prefix "translate French to English: "
--dataset_name wmt14
--dataset_config_name fr-en
--output_dir ./tmp/T5-large-fr-en
--per_device_train_batch_size=4
--per_device_eval_batch_size=4
--overwrite_output_dir
--predict_with_generate
--push_to_hub=True
Here is the last lines of the output
{'loss': 1.6703, 'learning_rate': 4.9998367482177885e-05, 'epoch': 0.0}
0% 1000/30627537 [1:00:01<26918:05:16, 3.16s/it][INFO|trainer.py:2926] 2023-06-09 19:06:01,861 >> Saving model checkpoint to ./tmp/tst-translation/checkpoint-1000
[INFO|configuration_utils.py:458] 2023-06-09 19:06:01,863 >> Configuration saved in ./tmp/tst-translation/checkpoint-1000/config.json
[INFO|configuration_utils.py:364] 2023-06-09 19:06:01,863 >> Configuration saved in ./tmp/tst-translation/checkpoint-1000/generation_config.json
[INFO|modeling_utils.py:1853] 2023-06-09 19:06:09,160 >> Model weights saved in ./tmp/tst-translation/checkpoint-1000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2194] 2023-06-09 19:06:09,162 >> tokenizer config file saved in ./tmp/tst-translation/checkpoint-1000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2201] 2023-06-09 19:06:09,162 >> Special tokens file saved in ./tmp/tst-translation/checkpoint-1000/special_tokens_map.json
[INFO|tokenization_t5_fast.py:186] 2023-06-09 19:06:09,234 >> Copy vocab file to ./tmp/tst-translation/checkpoint-1000/spiece.model
[INFO|tokenization_utils_base.py:2194] 2023-06-09 19:06:54,528 >> tokenizer config file saved in ./tmp/tst-translation/tokenizer_config.json
[INFO|tokenization_utils_base.py:2201] 2023-06-09 19:06:54,528 >> Special tokens file saved in ./tmp/tst-translation/special_tokens_map.json
[INFO|tokenization_t5_fast.py:186] 2023-06-09 19:06:54,599 >> Copy vocab file to ./tmp/tst-translation/spiece.model
0% 1228/30627537 [1:17:07<29988:35:31, 3.53s/it]^C
Here is the system information:
Transformers version: 4.31.0.dev0
Platform: Google Colab
Environment configuration: TPU, GPU(V100)
Python version: Python 3.10.12
The same issue is posted on github following this link: Training auto cancelling · Issue #24150 · huggingface/transformers · GitHub