I am trying to use RobertaModelForSequenceClassification
as my backbone and pytorch.DistributedDataParallel
to train dataparallel.
My qusetions are as follows.
-
metric Matthews correlation is used for training and evaluating or just evaluating? Is the loss function of cola dataset
nn.Crossentrophy
or Matthews correlation? -
what should I input to the model? Is these code below ok?
train_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask'])
val_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask'])
- In robertaforsentenceclassification source code
transformers/modeling_roberta.py at 198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b · huggingface/transformers · GitHub
are all attention_mask are the same after input to each layer_module in the loop of robertaencoder?
If you could give me some pyotrch&huggingface code without using trainer in huggingface, that would be so great!