Sure, @sgugger, here’s the complete output.
As for “try briefly with another model”, what all do I even need to change? Can I simply try changing --model_type
, or do I also have to change config.json, tokenzer, etc etc. ?
$ python run_mlm.py --model_type roberta --tokenizer_name /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/ --train_file /home/cleong/projects/personal/colin-summer-2021/data/oscar.eo.txt --max_seq_length 512 --do_train --output_dir ./output/test-mlm
/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
06/07/2021 09:06:58 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0distributed training: False, 16-bits training: False
06/07/2021 09:06:58 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir=./output/test-mlm, overwrite_output_dir=False, do_train=True, do_eval=False, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Jun07_09-06-58_act3admin-Precision-7730, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=./output/test-mlm, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=[], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, log_on_each_node=True, _n_gpu=0, mp_parameters=)
06/07/2021 09:06:58 - WARNING - datasets.builder - Using custom data configuration default-77be700d26e27b24
06/07/2021 09:06:58 - WARNING - datasets.builder - Reusing dataset text (/home/cleong/.cache/huggingface/datasets/text/default-77be700d26e27b24/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5)
06/07/2021 09:06:58 - WARNING - __main__ - You are instantiating a new config instance from scratch.
[INFO|configuration_utils.py:515] 2021-06-07 09:06:58,903 >> loading configuration file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/config.json
[INFO|configuration_utils.py:553] 2021-06-07 09:06:58,904 >> Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 207,
"model_type": "roberta",
"num_attention_heads": 6,
"num_hidden_layers": 3,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"transformers_version": "4.7.0.dev0",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 52000
}
[INFO|tokenization_utils_base.py:1651] 2021-06-07 09:06:58,904 >> Didn't find file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/tokenizer.json. We won't load it.
[INFO|tokenization_utils_base.py:1651] 2021-06-07 09:06:58,905 >> Didn't find file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1651] 2021-06-07 09:06:58,905 >> Didn't find file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/special_tokens_map.json. We won't load it.
[INFO|tokenization_utils_base.py:1651] 2021-06-07 09:06:58,905 >> Didn't find file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/tokenizer_config.json. We won't load it.
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,905 >> loading file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/vocab.json
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,905 >> loading file /home/cleong/projects/personal/colin-summer-2021/EsperBERTo/merges.txt
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,905 >> loading file None
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,905 >> loading file None
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,906 >> loading file None
[INFO|tokenization_utils_base.py:1715] 2021-06-07 09:06:58,906 >> loading file None
06/07/2021 09:06:59 - INFO - __main__ - Training new model from scratch
06/07/2021 09:07:01 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/cleong/.cache/huggingface/datasets/text/default-77be700d26e27b24/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5/cache-994c5abeed4d6e58.arrow
06/07/2021 09:07:01 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/cleong/.cache/huggingface/datasets/text/default-77be700d26e27b24/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5/cache-2cb7998233554805.arrow
[INFO|trainer.py:514] 2021-06-07 09:07:01,719 >> The following columns in the training set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask.
[INFO|trainer.py:1147] 2021-06-07 09:07:01,724 >> ***** Running training *****
[INFO|trainer.py:1148] 2021-06-07 09:07:01,724 >> Num examples = 143129
[INFO|trainer.py:1149] 2021-06-07 09:07:01,724 >> Num Epochs = 3
[INFO|trainer.py:1150] 2021-06-07 09:07:01,724 >> Instantaneous batch size per device = 8
[INFO|trainer.py:1151] 2021-06-07 09:07:01,724 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1152] 2021-06-07 09:07:01,724 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1153] 2021-06-07 09:07:01,724 >> Total optimization steps = 53676
0%| | 0/53676 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/cleong/projects/personal/colin-summer-2021/run_mlm.py", line 500, in <module>
main()
File "/home/cleong/projects/personal/colin-summer-2021/run_mlm.py", line 451, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/trainer.py", line 1263, in train
tr_loss += self.training_step(model, inputs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/trainer.py", line 1741, in training_step
loss = self.compute_loss(model, inputs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/trainer.py", line 1773, in compute_loss
outputs = model(**inputs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 1049, in forward
outputs = self.roberta(
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 808, in forward
embedding_output = self.embeddings(
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 122, in forward
position_embeddings = self.position_embeddings(position_ids)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 156, in forward
return F.embedding(
File "/home/cleong/miniconda3/envs/languagemodel/lib/python3.9/site-packages/torch/nn/functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self