Trainer.train() is stuck

Hi,
I’m training roberta-base using HF Trainer, but it’s stuck at the starting itself. Here’s my code -

train_dataset[0]
{'input_ids': tensor([  0, 100, 657,  ...,   1,   1,   1]),
 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]),
 'labels': tensor(0)}

val_dataset[0]
{'input_ids': tensor([    0, 11094,    14,  ...,     1,     1,     1]),
 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]),
 'labels': tensor(0)}

## simple test
model(train_dataset[:2]['input_ids'], attention_mask = train_dataset[:2]['attention_mask'], labels=train_dataset[:2]['labels'])
SequenceClassifierOutput(loss=tensor(0.6995, grad_fn=<NllLossBackward>), logits=tensor([[ 0.0438, -0.1893],
        [ 0.0530, -0.1786]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)

train_args = transformers.TrainingArguments(
             output_dir='test_1',
             overwrite_output_dir=True,
             evaluation_strategy="epoch",
             per_device_train_batch_size=8,
             per_device_eval_batch_size=8,
             learning_rate=3e-5,
             weight_decay=0.01,
             num_train_epochs=2,
             load_best_model_at_end=True,
             )

trainer = transformers.Trainer(
             model=model,
             args=train_args,
             train_dataset=train_dataset,
             eval_dataset=val_dataset,
             tokenizer=tok,
             )

trainer.train()

I saw memory consumption and it is stuck at -

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:62:00.0 Off |                    0 |
| N/A   49C    P0    60W / 300W |   1756MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   50C    P0    61W / 300W |   1376MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Plz let me know how to proceed further…