`tpu_cores` can only be 1, 8 or [<1-8>]

First it was giving me errors due to a missing argument gpus but I fixed that by adding,

parser.add_argument(’–gpus’, type=int)

to the parser and setting the gpus parameter at the run_pl.sh file. Doing so I then came up with this error, I can understand that this is an error caused by PL, but it is a misconfiguration exception which means we should be able to fix it ourselves in our code.

I am trying out text-classification example with pytorch-lightning (run_pl.sh). But it seems to throw out an exception.

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.p
redictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.w
eight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predict
ions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another tas
k or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to
 be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly 
initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "run_pl_glue.py", line 186, in <module>
    trainer = generic_train(model, args)
  File "/lvol/bhashithe/transformers/examples/lightning_base.py", line 299, in generic_train
    **train_params,
  File "/lvol/bhashithe/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 853, in from_argparse_args
    return cls(**trainer_kwargs)
  File "/lvol/bhashithe/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 468, in __init__
    self.tpu_cores = _parse_tpu_cores(tpu_cores)
  File "/lvol/bhashithe/env/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 526, in _parse_tpu_c
ores
    raise MisconfigurationException("`tpu_cores` can only be 1, 8 or [<1-8>]")
pytorch_lightning.utilities.exceptions.MisconfigurationException: `tpu_cores` can only be 1, 8 or [<1-8>]

The environment is all up to date, which has 8 GPUs (V100) and no TPUs.

Indeed, gpus is missing from argparse.

The following will fix your error:

-    parser.add_argument("--n_tpu_cores", dest="tpu_cores", type=int, default=0)
+    parser.add_argument("--n_tpu_cores", dest="tpu_cores", type=int)

but then it fails again elsewhere. I’m looking at it now.

Oh thank you, i will check it out. As a workaround I had commented out that line, I think this should work.

Here is my work in progress on the multiple breakages of this script:

Haha literally about 1.5 hours ago i sent the same pull request, but instead of wrapping the load_dataset() i renamed it, i liked yours better.

1 Like

Great minds think alike - I will link those PRs.

Haha literally about 1.5 hours ago i sent the same pull request, but instead of wrapping the load_dataset() i renamed it, i liked yours better.

I also wasn’t sure whether it has been used somewhere else, so left the original intact.