Hello all,
I’ve written a chatbot that works fine in a Trainer / PyTorch based environment mode on one GPU and with different models.
I tested with distilbert-base-uncased, bert-large-uncased, roberta-base, roberta-large, microsoft/deberta-large.
After making necessary modifications to run the program with Accelerator on 8 TPU it works fine for distilbert-base-uncased. Using roberta-base model the program runs in slooow motion and for all other (bigger?) models the program terminates with the following error message:
Launching a training on 8 TPU cores.
loading configuration file https://huggingface.co/bert-large-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2",
...
"LABEL_99": 99
},
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.3",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading weights file https://huggingface.co/bert-large-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d
---------------------------------------------------------------------------
ProcessExitedException Traceback (most recent call last)
<ipython-input-54-a91f3c0bb4fd> in <module>()
1 from accelerate import notebook_launcher
2
----> 3 notebook_launcher(training_function)
3 frames
/usr/local/lib/python3.7/dist-packages/accelerate/notebook_launcher.py in notebook_launcher(function, args, num_processes, use_fp16, use_port)
67 launcher = PrepareForLaunch(function, distributed_type="TPU")
68 print(f"Launching a training on {num_processes} TPU cores.")
---> 69 xmp.spawn(launcher, args=args, nprocs=num_processes, start_method="fork")
70 else:
71 # No need for a distributed launch otherwise as it's either CPU or one GPU.
/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py in spawn(fn, args, nprocs, join, daemon, start_method)
392 join=join,
393 daemon=daemon,
--> 394 start_method=start_method)
395
396
/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py in start_processes(fn, args, nprocs, join, daemon, start_method)
186
187 # Loop on join until it returns True or raises an exception.
--> 188 while not context.join():
189 pass
190
/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py in join(self, timeout)
134 error_pid=failed_process.pid,
135 exit_code=exitcode,
--> 136 signal_name=name
137 )
138 else:
ProcessExitedException: process 0 terminated with signal SIGKILL
I tested with different batch_sizes down to 1 and reduced max_length down to 32. No effect.
This case seems to be similiar to TPU memory issues.
Do I have a possibility to make some necessary modifications / settings or is the Accelerator / TPU currently not compatible with bigger models?