I am trying to use
RobertaModelForSequenceClassification as my backbone and
pytorch.DistributedDataParallel to train dataparallel.
My qusetions are as follows.
metric Matthews correlation is used for training and evaluating or just evaluating? Is the loss function of cola dataset
nn.Crossentrophyor Matthews correlation?
what should I input to the model? Is these code below ok?
train_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask']) val_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask'])
- In robertaforsentenceclassification source code
transformers/modeling_roberta.py at 198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b · huggingface/transformers · GitHub
are all attention_mask are the same after input to each layer_module in the loop of robertaencoder?
If you could give me some pyotrch&huggingface code without using trainer in huggingface, that would be so great!