OOM issues with exported vs. model card models

jrandel · March 8, 2021, 9:04am

Having a weird issue with DialoGPT Large model deployment. From PyTorch 1.8.0 and Transformers 4.3.3 using model.save_pretrained and tokenizer.save_pretrained, the exported pytorch_model.bin is almost twice the size of the model card repo and results in OOM on a reasonably equipped machine that when using the standard transformers download process it works fine (I am building a CI pipeline to containerize the model hence the pre-populated model requirement):

Model card:

pytorch_model.bin 1.6GB

model.save_pretrained and tokenizer.save_pretrained:

-rw-r--r-- 1 jrandel jrandel  800 Mar  6 16:51 config.json
-rw-r--r-- 1 jrandel jrandel 446K Mar  6 16:51 merges.txt
-rw-r--r-- 1 jrandel jrandel 3.0G Mar  6 16:51 pytorch_model.bin
-rw-r--r-- 1 jrandel jrandel  357 Mar  6 16:51 special_tokens_map.json
-rw-r--r-- 1 jrandel jrandel  580 Mar  6 16:51 tokenizer_config.json
-rw-r--r-- 1 jrandel jrandel 780K Mar  6 16:51 vocab.json

When I download the model card files directly however, I’m getting the following errors:

curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/pytorch_model.bin -o ./model/pytorch_model.bin
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/tokenizer_config.json -o ./model/tokenizer_config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/merges.txt -o ./model/merges.txt
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/special_tokens_map.json -o ./model/special_tokens_map.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/vocab.json -o ./model/vocab.json
<snip>
    tokenizer = AutoTokenizer.from_pretrained("model/")
  File "/var/lang/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 395, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1801, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1876, in _from_pretrained
    special_tokens_map = json.load(special_tokens_map_handle)
  File "/var/lang/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/var/lang/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/var/lang/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/var/lang/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/runtime/bootstrap.py", line 481, in <module>
    main()
  File "/var/runtime/bootstrap.py", line 458, in main
    lambda_runtime_client.post_init_error(to_json(error_result))
  File "/var/runtime/lambda_runtime_client.py", line 42, in post_init_error
    response = runtime_connection.getresponse()
  File "/var/lang/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/var/lang/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/var/lang/lib/python3.8/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
time="2021-03-08T09:01:39.33" level=warning msg="First fatal error stored in appctx: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Process 14(bootstrap) exited: Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=error msg="Init failed" InvokeID= error="Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=warning msg="Failed to send default error response: ErrInvalidInvokeID"
time="2021-03-08T09:01:39.33" level=error msg="INIT DONE failed: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Reset initiated: ReserveFail"

So what would be causing the large file variance between save_pretrained models and the model card repo? And any ideas why the directly downloaded model card files aren’t working in this example?

Thanks in advance

jrandel · March 9, 2021, 11:42am

Is this thing on. tap tap tap

Topic		Replies	Views
OOM issues with save_pretrained models 🤗Transformers	0	1054	March 9, 2021
OOM Issues fine-tune DialogGPT-small Beginners	0	261	February 20, 2023
CUDA OOM when export a large model to ONNX 🤗Optimum	5	2138	June 26, 2025
Size of saved model: Is there a way to make it smaller for deploy? Beginners	1	593	July 27, 2023
OOM error with standard NC24 ads A100 v4 Models	0	414	September 15, 2023

OOM issues with exported vs. model card models

Related topics