Hi guys,
I am new to Deep Learning and wanted to train a binary (sentiment) classification using SimpleTransformers. As a dataset I took Sentiment140 (1,6 Tweets 800k Positive, 800k Negative). The training itself works, but depending on the length of the dataset Google Colab crashes. If I divide the 1.6 million tweets into 1.28 million training and 0.32 million test data the model crashes after ->
[2020-12-28 16:55:15,023] {classification_model.py:1147} INFO - Converting to features started. Cache is not used.
100%
1278719/1278719 [09:25<00:00, 2260.76it/s]
(1) Is this normal?
Now if I reduce the number to 800k training, 160k test data Google Colab does not crash, but one epoch takes 4 hours. (This number often works, sometimes 800k training-data also crashes as described above. When it gets to training, I don’t even know if it goes through - since an epoch lasts 4 hours, I’ve never run it through)
I do not know how far you can compare the things, but in tensorflow i have trained a CNN, BiLSTM network on the entire data set and there an epoch took only 5 minutes, (2) does 4 hours make sense, or have I made a gross error?
[2020-12-28 17:45:10,844] {classification_model.py:1147} INFO - Converting to features started. Cache is not used.
100%
800000/800000 [05:44<00:00, 2638.77it/s]
Epoch 1 of 1: 0%
0/1 [00:00<?, ?it/s]
Epochs 0/1. Running Loss. 0.6640: 0% 375/100000 [01:04<3:50:03, 7.19it/s]
import torch
torch.cuda.is_available()
True
model_type, model_name = 'roberta', 'roberta-base'
model_args = {
'output_dir': 'outputs/',
'cache_dir': 'cache/',
'max_seq_length': 144,
'num_train_epochs': 1,#50
'learning_rate': 1e-3,
'adam_epsilon': 1e-8,
"early_stopping_delta" : 1e-3,
"early_stopping_patience" : 5, #5
'overwrite_output_dir': True,
'manual_seed' : True,
'silent' : SILENT
}
model = ClassificationModel(model_type=model_type, model_name=model_name, args=model_args,
use_cuda=True,
num_labels=2)
I also tried to add 'eval_accumulation_steps' : 20
to my model_args
, but it still crashed pre-training
ty in advanced