Hi!
Is there a limit of how many labels i can use to distil the zero-shot classification pipeline (Google Colab) as described in joeddav/distilbert-base-uncased-agnews-student? If I run the colab notebook with the original 4 labels, or even 8 labels, everything runs fine. If I declare more than 10 labels, I get an error.
!python transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py \
--data_file ./agnews/train_unlabeled.txt \
--class_names_file ./agnews/class_names.txt \
--hypothesis_template "This text is about {}." \
--student_name_or_path distilbert-base-uncased \
--output_dir ./EYY-distilled_classifier
120000ex [00:33, 3618.70ex/s]
02/24/2022 21:02:56 - INFO - main - Training student model on teacher predictions
[INFO|trainer.py:554] 2022-02-24 21:02:56,888 >> The following columns in the training set don’t have a corresponding argument in DistilBertForSequenceClassification.forward
and have been ignored: text.
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
FutureWarning,
[INFO|trainer.py:1244] 2022-02-24 21:02:56,902 >> ***** Running training *****
[INFO|trainer.py:1245] 2022-02-24 21:02:56,902 >> Num examples = 120000
[INFO|trainer.py:1246] 2022-02-24 21:02:56,903 >> Num Epochs = 1
[INFO|trainer.py:1247] 2022-02-24 21:02:56,903 >> Instantaneous batch size per device = 32
[INFO|trainer.py:1248] 2022-02-24 21:02:56,903 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1249] 2022-02-24 21:02:56,903 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1250] 2022-02-24 21:02:56,903 >> Total optimization steps = 3750
0% 0/3750 [00:00<?, ?it/s]Traceback (most recent call last):
File “transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py”, line 338, in
main()
File “transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py”, line 328, in main
trainer.train()
File “/usr/local/lib/python3.7/dist-packages/transformers/trainer.py”, line 1365, in train
tr_loss_step = self.training_step(model, inputs)
File “/usr/local/lib/python3.7/dist-packages/transformers/trainer.py”, line 1940, in training_step
loss = self.compute_loss(model, inputs)
File “transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py”, line 119, in compute_loss
target_p = inputs[“labels”]
KeyError: ‘labels’
0% 0/3750 [00:00<?, ?it/s]