Hi,
I’m training roberta-base using HF Trainer, but it’s stuck at the starting itself. Here’s my code -
train_dataset[0]
{'input_ids': tensor([ 0, 100, 657, ..., 1, 1, 1]),
'attention_mask': tensor([1, 1, 1, ..., 0, 0, 0]),
'labels': tensor(0)}
val_dataset[0]
{'input_ids': tensor([ 0, 11094, 14, ..., 1, 1, 1]),
'attention_mask': tensor([1, 1, 1, ..., 0, 0, 0]),
'labels': tensor(0)}
## simple test
model(train_dataset[:2]['input_ids'], attention_mask = train_dataset[:2]['attention_mask'], labels=train_dataset[:2]['labels'])
SequenceClassifierOutput(loss=tensor(0.6995, grad_fn=<NllLossBackward>), logits=tensor([[ 0.0438, -0.1893],
[ 0.0530, -0.1786]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
train_args = transformers.TrainingArguments(
output_dir='test_1',
overwrite_output_dir=True,
evaluation_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=3e-5,
weight_decay=0.01,
num_train_epochs=2,
load_best_model_at_end=True,
)
trainer = transformers.Trainer(
model=model,
args=train_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
tokenizer=tok,
)
trainer.train()
I saw memory consumption and it is stuck at -
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:62:00.0 Off | 0 |
| N/A 49C P0 60W / 300W | 1756MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 50C P0 61W / 300W | 1376MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Plz let me know how to proceed further…