Stucked on "Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding."

System Info

  • transformers version: 4.31.0.dev0
  • Platform: Linux-5.14.21-150400.24.55-default-x86_64-with-glibc2.31
  • Python version: 3.10.10
  • Huggingface_hub version: 0.15.1
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes

I intend to use run_mlm.py to train RoBERTa from scratch. To the training, I’m using data created my myself, and I entered the following command:

CUDA_VISIBLE_DEVICES=0,1,2 python run_mlm.py \
    --model_type roberta \
    --config_overrides="num_hidden_layers=6,max_position_embeddings=514" \
    --tokenizer_name MyModel \
    --train_file ./data/corpus_dedup.txt \
    --max_seq_length 512 \
    --line_by_line True \
    --per_device_train_batch_size 64 \
    --do_train \
    --overwrite_output_dir True \
    --gradient_accumulation_steps 4 \
    --num_train_epochs 40 \
    --fp16 True \
    --output_dir MyModel \
    --save_total_limit 1

When I try to do the training using a 3-GPU configuration, I’m getting stucked for dozens of hours before the training starts, with the following message:

You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the callmethod is faster than using a method to encode the text followed by a call to thepad method to get a padded encoding.

Aditionally, when I try to do the training with only 2 GPU (CUDA_VISIBLE_DEVICES=0,1, followed by the same parameters), my training runs normally…

I tried to add TOKENIZERS_PARALLELISM=0 CUDA_VISIBLE_DEVICE.... before the command line, but the issue remains the same, and nvidia-smi returns the following:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe           Off| 00000000:52:00.0 Off |                    0 |
| N/A   37C    P0               71W / 300W|   1885MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe           Off| 00000000:CE:00.0 Off |                    0 |
| N/A   39C    P0               69W / 300W|   1863MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe           Off| 00000000:D1:00.0 Off |                    0 |
| N/A   43C    P0               71W / 300W|   1863MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     62822      C   python                                     1882MiB |
|    1   N/A  N/A     62822      C   python                                     1860MiB |
|    2   N/A  N/A     62822      C   python                                     1860MiB |
+---------------------------------------------------------------------------------------+

It’s very odd, because the GPUs are (barely) used, but I got no training. So, I ran the debugger inside the main() function, and got the following:

> /cfs/home/u021274/higo/run_mlm.py(234)main()
-> parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(235)main()
-> if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(240)main()
-> model_args, data_args, training_args = parser.parse_args_into_dataclasses()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(244)main()
-> send_example_telemetry("run_mlm", model_args, data_args)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(247)main()
-> logging.basicConfig(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(248)main()
-> format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(249)main()
-> datefmt="%m/%d/%Y %H:%M:%S",
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(250)main()
-> handlers=[logging.StreamHandler(sys.stdout)],
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(247)main()
-> logging.basicConfig(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(253)main()
-> if training_args.should_log:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(255)main()
-> transformers.utils.logging.set_verbosity_info()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(257)main()
-> log_level = training_args.get_process_log_level()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(258)main()
-> logger.setLevel(log_level)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(259)main()
-> datasets.utils.logging.set_verbosity(log_level)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(260)main()
-> transformers.utils.logging.set_verbosity(log_level)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(261)main()
-> transformers.utils.logging.enable_default_handler()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(262)main()
-> transformers.utils.logging.enable_explicit_format()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(265)main()
-> logger.warning(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(266)main()
-> f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(267)main()
-> + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(266)main()
-> f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(265)main()
-> logger.warning(
(Pdb) n
06/26/2023 19:45:08 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 3distributed training: True, 16-bits training: True
> /cfs/home/u021274/higo/run_mlm.py(270)main()
-> logger.info(f"Training/evaluation parameters {training_args}")
(Pdb) n
06/26/2023 19:45:09 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=3,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=MyModel/runs/Jun26_19-44-10_g07,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=40.0,
optim=adamw_hf,
optim_args=None,
output_dir=MyModel,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=64,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=MyModel,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=1,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
> /cfs/home/u021274/higo/run_mlm.py(273)main()
-> last_checkpoint = None
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(274)main()
-> if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(288)main()
-> set_seed(training_args.seed)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(299)main()
-> if data_args.dataset_name is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(326)main()
-> data_files = {}
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(327)main()
-> if data_args.train_file is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(328)main()
-> data_files["train"] = data_args.train_file
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(329)main()
-> extension = data_args.train_file.split(".")[-1]
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(330)main()
-> if data_args.validation_file is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(333)main()
-> if extension == "txt":
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(334)main()
-> extension = "text"
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(335)main()
-> raw_datasets = load_dataset(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(336)main()
-> extension,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(337)main()
-> data_files=data_files,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(338)main()
-> cache_dir=model_args.cache_dir,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(339)main()
-> use_auth_token=True if model_args.use_auth_token else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(335)main()
-> raw_datasets = load_dataset(
(Pdb) n
06/26/2023 19:45:33 - INFO - datasets.builder - Using custom data configuration default-2df3a67ae9ac7743
06/26/2023 19:45:33 - INFO - datasets.info - Loading Dataset Infos from /cfs/home/u021274/higo/myenv/lib64/python3.10/site-packages/datasets/packaged_modules/text
06/26/2023 19:45:33 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
06/26/2023 19:45:33 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
06/26/2023 19:45:34 - WARNING - datasets.builder - Found cached dataset text (/cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)
06/26/2023 19:45:34 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.00it/s]
> /cfs/home/u021274/higo/run_mlm.py(343)main()
-> if "validation" not in raw_datasets.keys():
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(344)main()
-> raw_datasets["validation"] = load_dataset(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(345)main()
-> extension,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(346)main()
-> data_files=data_files,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(347)main()
-> split=f"train[:{data_args.validation_split_percentage}%]",
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(348)main()
-> cache_dir=model_args.cache_dir,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(349)main()
-> use_auth_token=True if model_args.use_auth_token else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(344)main()
-> raw_datasets["validation"] = load_dataset(
(Pdb) n
06/26/2023 19:45:52 - INFO - datasets.builder - Using custom data configuration default-2df3a67ae9ac7743
06/26/2023 19:45:52 - INFO - datasets.info - Loading Dataset Infos from /cfs/home/u021274/higo/myenv/lib64/python3.10/site-packages/datasets/packaged_modules/text
06/26/2023 19:45:52 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
06/26/2023 19:45:52 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
06/26/2023 19:45:52 - WARNING - datasets.builder - Found cached dataset text (/cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)
06/26/2023 19:45:52 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
> /cfs/home/u021274/higo/run_mlm.py(351)main()
-> raw_datasets["train"] = load_dataset(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(352)main()
-> extension,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(353)main()
-> data_files=data_files,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(354)main()
-> split=f"train[{data_args.validation_split_percentage}%:]",
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(355)main()
-> cache_dir=model_args.cache_dir,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(356)main()
-> use_auth_token=True if model_args.use_auth_token else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(351)main()
-> raw_datasets["train"] = load_dataset(
(Pdb) n
06/26/2023 19:46:02 - INFO - datasets.builder - Using custom data configuration default-2df3a67ae9ac7743
06/26/2023 19:46:02 - INFO - datasets.info - Loading Dataset Infos from /cfs/home/u021274/higo/myenv/lib64/python3.10/site-packages/datasets/packaged_modules/text
06/26/2023 19:46:02 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
06/26/2023 19:46:02 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
06/26/2023 19:46:02 - WARNING - datasets.builder - Found cached dataset text (/cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)
06/26/2023 19:46:02 - INFO - datasets.info - Loading Dataset info from /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2
> /cfs/home/u021274/higo/run_mlm.py(368)main()
-> "cache_dir": model_args.cache_dir,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(369)main()
-> "revision": model_args.model_revision,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(370)main()
-> "use_auth_token": True if model_args.use_auth_token else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(367)main()
-> config_kwargs = {
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(372)main()
-> if model_args.config_name:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(374)main()
-> elif model_args.model_name_or_path:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(377)main()
-> config = CONFIG_MAPPING[model_args.model_type]()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(378)main()
-> logger.warning("You are instantiating a new config instance from scratch.")
(Pdb) n
06/26/2023 19:46:14 - WARNING - __main__ - You are instantiating a new config instance from scratch.
> /cfs/home/u021274/higo/run_mlm.py(379)main()
-> if model_args.config_overrides is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(380)main()
-> logger.info(f"Overriding config: {model_args.config_overrides}")
(Pdb) n
06/26/2023 19:46:17 - INFO - __main__ - Overriding config: num_hidden_layers=6,max_position_embeddings=514
> /cfs/home/u021274/higo/run_mlm.py(381)main()
-> config.update_from_string(model_args.config_overrides)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(382)main()
-> logger.info(f"New config: {config}")
(Pdb) n
06/26/2023 19:46:19 - INFO - __main__ - New config: RobertaConfig {
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50265
}

> /cfs/home/u021274/higo/run_mlm.py(385)main()
-> "cache_dir": model_args.cache_dir,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(386)main()
-> "use_fast": model_args.use_fast_tokenizer,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(387)main()
-> "revision": model_args.model_revision,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(388)main()
-> "use_auth_token": True if model_args.use_auth_token else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(384)main()
-> tokenizer_kwargs = {
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(390)main()
-> if model_args.tokenizer_name:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(391)main()
-> tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, **tokenizer_kwargs)
(Pdb) n
[INFO|tokenization_auto.py:503] 2023-06-26 19:47:10,919 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:710] 2023-06-26 19:47:10,922 >> loading configuration file MyModel/config.json
[INFO|configuration_utils.py:768] 2023-06-26 19:47:10,932 >> Model config RobertaConfig {
  "_name_or_path": "MyModel",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file vocab.json
[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file merges.txt
[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1842] 2023-06-26 19:47:10,946 >> loading file tokenizer_config.json
[INFO|configuration_utils.py:710] 2023-06-26 19:47:10,947 >> loading configuration file MyModel/config.json
[INFO|configuration_utils.py:768] 2023-06-26 19:47:10,950 >> Model config RobertaConfig {
  "_name_or_path": "MyModel",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|configuration_utils.py:710] 2023-06-26 19:47:11,024 >> loading configuration file MyModel/config.json
[INFO|configuration_utils.py:768] 2023-06-26 19:47:11,027 >> Model config RobertaConfig {
  "_name_or_path": "MyModel",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

> /cfs/home/u021274/higo/run_mlm.py(400)main()
-> if model_args.model_name_or_path:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(411)main()
-> logger.info("Training new model from scratch")
(Pdb) n
06/26/2023 19:47:14 - INFO - __main__ - Training new model from scratch
> /cfs/home/u021274/higo/run_mlm.py(412)main()
-> model = AutoModelForMaskedLM.from_config(config)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(416)main()
-> embedding_size = model.get_input_embeddings().weight.shape[0]
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(417)main()
-> if len(tokenizer) > embedding_size:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(422)main()
-> if training_args.do_train:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(423)main()
-> column_names = list(raw_datasets["train"].features)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(426)main()
-> text_column_name = "text" if "text" in column_names else column_names[0]
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(428)main()
-> if data_args.max_seq_length is None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(438)main()
-> if data_args.max_seq_length > tokenizer.model_max_length:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(443)main()
-> max_seq_length = min(data_args.max_seq_length, tokenizer.model_max_length)
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(445)main()
-> if data_args.line_by_line:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(447)main()
-> padding = "max_length" if data_args.pad_to_max_length else False
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(449)main()
-> def tokenize_function(examples):
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(464)main()
-> with training_args.main_process_first(desc="dataset map tokenization"):
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(465)main()
-> if not data_args.streaming:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(466)main()
-> tokenized_datasets = raw_datasets.map(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(467)main()
-> tokenize_function,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(468)main()
-> batched=True,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(469)main()
-> num_proc=data_args.preprocessing_num_workers,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(470)main()
-> remove_columns=[text_column_name],
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(471)main()
-> load_from_cache_file=not data_args.overwrite_cache,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(472)main()
-> desc="Running tokenizer on dataset line_by_line",
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(466)main()
-> tokenized_datasets = raw_datasets.map(
(Pdb) n
06/26/2023 19:47:51 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2/cache-c8ae7ecb92d28874.arrow
06/26/2023 19:47:51 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /cfs/home/u021274/.cache/huggingface/datasets/text/default-2df3a67ae9ac7743/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2/cache-20fc928d1e2a7f3b.arrow
> /cfs/home/u021274/higo/run_mlm.py(464)main()
-> with training_args.main_process_first(desc="dataset map tokenization"):
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(542)main()
-> if training_args.do_train:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(543)main()
-> if "train" not in tokenized_datasets:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(545)main()
-> train_dataset = tokenized_datasets["train"]
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(546)main()
-> if data_args.max_train_samples is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(550)main()
-> if training_args.do_eval:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(580)main()
-> pad_to_multiple_of_8 = data_args.line_by_line and training_args.fp16 and not data_args.pad_to_max_length
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(581)main()
-> data_collator = DataCollatorForLanguageModeling(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(582)main()
-> tokenizer=tokenizer,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(583)main()
-> mlm_probability=data_args.mlm_probability,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(584)main()
-> pad_to_multiple_of=8 if pad_to_multiple_of_8 else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(581)main()
-> data_collator = DataCollatorForLanguageModeling(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(588)main()
-> trainer = Trainer(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(589)main()
-> model=model,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(590)main()
-> args=training_args,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(591)main()
-> train_dataset=train_dataset if training_args.do_train else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(592)main()
-> eval_dataset=eval_dataset if training_args.do_eval else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(593)main()
-> tokenizer=tokenizer,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(594)main()
-> data_collator=data_collator,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(595)main()
-> compute_metrics=compute_metrics if training_args.do_eval and not is_torch_tpu_available() else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(597)main()
-> if training_args.do_eval and not is_torch_tpu_available()
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(596)main()
-> preprocess_logits_for_metrics=preprocess_logits_for_metrics
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(598)main()
-> else None,
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(588)main()
-> trainer = Trainer(
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(602)main()
-> if training_args.do_train:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(603)main()
-> checkpoint = None
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(604)main()
-> if training_args.resume_from_checkpoint is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(606)main()
-> elif last_checkpoint is not None:
(Pdb) n
> /cfs/home/u021274/higo/run_mlm.py(608)main()
-> train_result = trainer.train(resume_from_checkpoint=checkpoint)
(Pdb) n
[INFO|trainer.py:769] 2023-06-26 19:48:46,054 >> The following columns in the training set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
/cfs/home/u021274/higo/myenv/lib64/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1680] 2023-06-26 19:48:46,071 >> ***** Running training *****
[INFO|trainer.py:1681] 2023-06-26 19:48:46,071 >>   Num examples = 2,353,535
[INFO|trainer.py:1682] 2023-06-26 19:48:46,071 >>   Num Epochs = 40
[INFO|trainer.py:1683] 2023-06-26 19:48:46,071 >>   Instantaneous batch size per device = 192
[INFO|trainer.py:1684] 2023-06-26 19:48:46,071 >>   Total train batch size (w. parallel, distributed & accumulation) = 768
[INFO|trainer.py:1685] 2023-06-26 19:48:46,071 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1686] 2023-06-26 19:48:46,071 >>   Total optimization steps = 122,560
[INFO|trainer.py:1687] 2023-06-26 19:48:46,074 >>   Number of trainable parameters = 82,170,969
[INFO|integrations.py:727] 2023-06-26 19:48:46,077 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: <USER>. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.4
wandb: Run data is saved locally in /cfs/home/u021274/higo/wandb/run-20230626_194847-vr14588a
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run fragrant-universe-48
wandb: ⭐️ View project at <URL>
wandb: 🚀 View run at <URL>
  0%|                                                                                                                                                                           | 0/122560 [00:00<?, ?it/s][WARNING|logging.py:280] 2023-06-26 19:49:01,837 >> You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

The problem seems to be inside the train() function, but I can’t find what is the cause. Can someone help me to figure out what’s happening?