OOM issues with exported vs. model card models

Having a weird issue with DialoGPT Large model deployment. From PyTorch 1.8.0 and Transformers 4.3.3 using model.save_pretrained and tokenizer.save_pretrained, the exported pytorch_model.bin is almost twice the size of the model card repo and results in OOM on a reasonably equipped machine that when using the standard transformers download process it works fine (I am building a CI pipeline to containerize the model hence the pre-populated model requirement):

Model card:

pytorch_model.bin 1.6GB

model.save_pretrained and tokenizer.save_pretrained:

-rw-r--r-- 1 jrandel jrandel  800 Mar  6 16:51 config.json
-rw-r--r-- 1 jrandel jrandel 446K Mar  6 16:51 merges.txt
-rw-r--r-- 1 jrandel jrandel 3.0G Mar  6 16:51 pytorch_model.bin
-rw-r--r-- 1 jrandel jrandel  357 Mar  6 16:51 special_tokens_map.json
-rw-r--r-- 1 jrandel jrandel  580 Mar  6 16:51 tokenizer_config.json
-rw-r--r-- 1 jrandel jrandel 780K Mar  6 16:51 vocab.json

When I download the model card files directly however, I’m getting the following errors:

curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/pytorch_model.bin -o ./model/pytorch_model.bin
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/tokenizer_config.json -o ./model/tokenizer_config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/merges.txt -o ./model/merges.txt
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/special_tokens_map.json -o ./model/special_tokens_map.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/vocab.json -o ./model/vocab.json
<snip>
    tokenizer = AutoTokenizer.from_pretrained("model/")
  File "/var/lang/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 395, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1801, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1876, in _from_pretrained
    special_tokens_map = json.load(special_tokens_map_handle)
  File "/var/lang/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/var/lang/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/var/lang/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/var/lang/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/runtime/bootstrap.py", line 481, in <module>
    main()
  File "/var/runtime/bootstrap.py", line 458, in main
    lambda_runtime_client.post_init_error(to_json(error_result))
  File "/var/runtime/lambda_runtime_client.py", line 42, in post_init_error
    response = runtime_connection.getresponse()
  File "/var/lang/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/var/lang/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/var/lang/lib/python3.8/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
time="2021-03-08T09:01:39.33" level=warning msg="First fatal error stored in appctx: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Process 14(bootstrap) exited: Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=error msg="Init failed" InvokeID= error="Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=warning msg="Failed to send default error response: ErrInvalidInvokeID"
time="2021-03-08T09:01:39.33" level=error msg="INIT DONE failed: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Reset initiated: ReserveFail"

So what would be causing the large file variance between save_pretrained models and the model card repo? And any ideas why the directly downloaded model card files aren’t working in this example?

Thanks in advance

Is this thing on. tap tap tap