Hi,
I got a Colab up and running, verbatim from the Flan-UL2 example script in the “Running the model” section. That means I got the model to run successfully, producing the expected output etc.
However, I noticed that during the model =(...)
call, the notebook runtime downloaded 8 chunks of ~ 5 GB data. Being on Colab, I wanted to save the model to GDrive between each session. Naively, I ran
model.save_pretrained("/content/drive/MyDrive/Colab Notebooks/data/ul2-hf-model", from_pt=True)
and after a runtime restart:
model = T5ForConditionalGeneration.from_pretrained("/content/drive/MyDrive/Colab Notebooks/data/ul2-hf-model", device_map="auto")
The model seems to be loaded into memory as 4 checkpoint shards, and also seem to process my input string. The result to any input string however, is <pad></s>
. I guess there is some basic flaw to my naive model saving attempt.
In the model card, I notice that there is a “Converting from T5x to HF” section, with a link to a conversion script. Do I need to run that script for the model, if I want to save it as a HF/ Torch model?
EDIT3:
If helpful, this is the output of print(model.hf_device_map)
:
{'shared': 0, 'lm_head': 0, 'encoder': 0, 'decoder.embed_tokens': 0, 'decoder.block.0': 0, 'decoder.block.1': 0, 'decoder.block.2': 0, 'decoder.block.3': 'cpu', 'decoder.block.4': 'cpu', 'decoder.block.5': 'cpu', 'decoder.block.6': 'cpu', 'decoder.block.7': 'cpu', 'decoder.block.8': 'cpu', 'decoder.block.9': 'cpu', 'decoder.block.10': 'cpu', 'decoder.block.11': 'cpu', 'decoder.block.12': 'cpu', 'decoder.block.13': 'cpu', 'decoder.block.14': 'cpu', 'decoder.block.15': 'cpu', 'decoder.block.16': 'cpu', 'decoder.block.17': 'cpu', 'decoder.block.18': 'cpu', 'decoder.block.19': 'cpu', 'decoder.block.20': 'cpu', 'decoder.block.21': 'cpu', 'decoder.block.22': 'cpu', 'decoder.block.23': 'cpu', 'decoder.block.24': 'cpu', 'decoder.block.25': 'cpu', 'decoder.block.26': 'cpu', 'decoder.block.27': 'cpu', 'decoder.block.28': 'cpu', 'decoder.block.29': 'cpu', 'decoder.block.30': 'cpu', 'decoder.block.31': 'cpu', 'decoder.final_layer_norm': 'cpu', 'decoder.dropout': 'cpu'}