Error loading tokenizer for gpssohi/distilbart-qgen-6-6

First of all, I am unable to load this model or a pipeline using “gpssohi/distilbart-qgen-6-6” as I get the message:

OSError: Can't load config for 'gpssohi/distilbart-qgen-6-6'. Make sure that:
- 'gpssohi/distilbart-qgen-6-6' is a correct model identifier listed on ''
- or 'gpssohi/distilbart-qgen-6-6' is the correct path to a directory containing a config.json file

This despite the instructions on the model card:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("gpssohi/distilbart-qgen-6-6")

model = AutoModelForSeq2SeqLM.from_pretrained("gpssohi/distilbart-qgen-6-6")

So I downloaded the model files locally and ran:

from transformers import BartTokenizer
tokenizer = BartTokenizer.from_pretrained("/pub/models/gpssohi/distilbart-qgen-6-6")

which produces the error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/IPython/core/", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-9811fff8faaa>", line 2, in <module>
    tokenizer = BartTokenizer.from_pretrained("/pub/models/gpssohi/distilbart-qgen-6-6")
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 1428, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 1575, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 174, in __init__
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 169, in __init__
    super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 116, in __init__
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 1314, in __init__
  File "/usr/local/lib/python3.6/site-packages/transformers/", line 658, in __init__
    "special token {} has to be either str or AddedToken but got: {}".format(key, type(value))
TypeError: special token bos_token has to be either str or AddedToken but got: <class 'dict'>

I did some spelunking through the code and found that bos_token (and its siblings) are loaded via file tokenizer_config.json, which contains:

{"unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "errors": "replace", "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "sshleifer/distilbart-cnn-6-6", "tokenizer_class": "BartTokenizer"}

This is loaded via json.load, resulting in the value of each token being (yup!), a dictionary! Now the value of the __type key for each token makes it seem like these are serialized AddedToken values, which, if properly reconstituted, would let this run without error.

Is this a known bug? If so, is there a fix/patch? Is there an alternative usage such that I can get past this and avoid having to “hack” Transformers code?