OOM issues with save_pretrained models

Having a weird issue with DialoGPT Large model deployment. From PyTorch 1.8.0 and Transformers 4.3.3 using model.save_pretrained and tokenizer.save_pretrained, the exported pytorch_model.bin is almost twice the size of the model card repo and results in OOM on a reasonably equipped machine that when using the standard transformers download process it works fine (I am building a CI pipeline to containerize the model hence the pre-populated model requirement):

Model card:

pytorch_model.bin 1.6GB

model.save_pretrained and tokenizer.save_pretrained:

-rw-r--r-- 1 jrandel jrandel  800 Mar  6 16:51 config.json
-rw-r--r-- 1 jrandel jrandel 446K Mar  6 16:51 merges.txt
-rw-r--r-- 1 jrandel jrandel 3.0G Mar  6 16:51 pytorch_model.bin
-rw-r--r-- 1 jrandel jrandel  357 Mar  6 16:51 special_tokens_map.json
-rw-r--r-- 1 jrandel jrandel  580 Mar  6 16:51 tokenizer_config.json
-rw-r--r-- 1 jrandel jrandel 780K Mar  6 16:51 vocab.json

When I download the model card files directly however, I’m getting the following errors:

curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/pytorch_model.bin -o ./model/pytorch_model.bin
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/tokenizer_config.json -o ./model/tokenizer_config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/merges.txt -o ./model/merges.txt
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/special_tokens_map.json -o ./model/special_tokens_map.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/vocab.json -o ./model/vocab.json
<snip>
    tokenizer = AutoTokenizer.from_pretrained("model/")
  File "/var/lang/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 395, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1801, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1876, in _from_pretrained
    special_tokens_map = json.load(special_tokens_map_handle)
  File "/var/lang/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/var/lang/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/var/lang/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/var/lang/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/runtime/bootstrap.py", line 481, in <module>
    main()
  File "/var/runtime/bootstrap.py", line 458, in main
    lambda_runtime_client.post_init_error(to_json(error_result))
  File "/var/runtime/lambda_runtime_client.py", line 42, in post_init_error
    response = runtime_connection.getresponse()
  File "/var/lang/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/var/lang/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/var/lang/lib/python3.8/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
time="2021-03-08T09:01:39.33" level=warning msg="First fatal error stored in appctx: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Process 14(bootstrap) exited: Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=error msg="Init failed" InvokeID= error="Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=warning msg="Failed to send default error response: ErrInvalidInvokeID"
time="2021-03-08T09:01:39.33" level=error msg="INIT DONE failed: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Reset initiated: ReserveFail"

So what would be causing the large file variance between save_pretrained models and the model card repo? And any ideas why the directly downloaded model card files aren’t working in this example?

Thanks in advance