Hello,
This post is related to `run_glue.py` fails when using my own dataset of regression task · Issue #9393 · huggingface/transformers · GitHub and [examples/text-classification] `do_predict` for the test set of local datasets · Issue #9442 · huggingface/transformers · GitHub.
While I was writing the text to open an issue, I realized that it seemed to be a simple mistake on my part.
If anyone gives the detail about it, I would appreciate your comments.
Information
Model I am using (Bert, XLNet …): Bert
The problem arises when using:
- [ ] the official example scripts: (give details below)
- [x] my own modified scripts: (give details below)
almost the same as run_glue.py
, but add some modifications in evaluation metrics, using test sets
The tasks I am working on is:
- [ ] an official GLUE/SQUaD task: (give the name)
- [x] my own task or dataset: (give details below)
To reproduce
It seems that an error occurs when I use run_glue.py with my own dataset of a regression task.
CUDA_VISIBLE_DEVICES=0 python <my_modified_run_glue.py> \
--model_name_or_path bert-base-cased \
--train_file data/****.csv \
--validation_file data/****.csv \
--test_file data/****.csv \ # this arg is added for issue #9442
--do_train \
--do_eval \
--do_predict \ # this arg is related to issue #9442
--max_seq_length 64 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 10.0 \
--load_best_model_at_end \
--evaluation_strategy epoch \
--metric_for_best_model eval_pearson \
--output_dir **** \
--overwrite_output_dir
An example of the train/valid CSV file is as below:
id,label,sentence1
__id_as_string__,3.0,__string__
Then, the trainer gives me the information below.
[INFO|trainer.py:387] 2021-01-07 12:52:02,202 >> The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: id, sentenc
e1.
[INFO|trainer.py:387] 2021-01-07 12:52:02,204 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: id, sente
nce1.
Expected behavior
it is natural that id
column is ignored, but I didn’t know why sentence1
is ignored.
I checked again the task_to_keys
in the original script:
task_to_keys = {
"cola": ("sentence", None),
"mnli": ("premise", "hypothesis"),
"mrpc": ("sentence1", "sentence2"),
"qnli": ("question", "sentence"),
"qqp": ("question1", "question2"),
"rte": ("sentence1", "sentence2"),
"sst2": ("sentence", None),
"stsb": ("sentence1", "sentence2"),
"wnli": ("sentence1", "sentence2"),
}
Should I use “sentence” instead of "sentence" if there is only one sentence in the input (in other words, sentence2 is
None`)?
Thank you in advance.