Flan-UL2, must convert to save model?

Canoot · March 7, 2023, 5:22pm

Hi,

I got a Colab up and running, verbatim from the Flan-UL2 example script in the “Running the model” section. That means I got the model to run successfully, producing the expected output etc.

However, I noticed that during the model =(...) call, the notebook runtime downloaded 8 chunks of ~ 5 GB data. Being on Colab, I wanted to save the model to GDrive between each session. Naively, I ran

model.save_pretrained("/content/drive/MyDrive/Colab Notebooks/data/ul2-hf-model", from_pt=True)

and after a runtime restart:

model = T5ForConditionalGeneration.from_pretrained("/content/drive/MyDrive/Colab Notebooks/data/ul2-hf-model", device_map="auto")

The model seems to be loaded into memory as 4 checkpoint shards, and also seem to process my input string. The result to any input string however, is <pad></s>. I guess there is some basic flaw to my naive model saving attempt.

In the model card, I notice that there is a “Converting from T5x to HF” section, with a link to a conversion script. Do I need to run that script for the model, if I want to save it as a HF/ Torch model?

EDIT3:
If helpful, this is the output of print(model.hf_device_map):

{'shared': 0, 'lm_head': 0, 'encoder': 0, 'decoder.embed_tokens': 0, 'decoder.block.0': 0, 'decoder.block.1': 0, 'decoder.block.2': 0, 'decoder.block.3': 'cpu', 'decoder.block.4': 'cpu', 'decoder.block.5': 'cpu', 'decoder.block.6': 'cpu', 'decoder.block.7': 'cpu', 'decoder.block.8': 'cpu', 'decoder.block.9': 'cpu', 'decoder.block.10': 'cpu', 'decoder.block.11': 'cpu', 'decoder.block.12': 'cpu', 'decoder.block.13': 'cpu', 'decoder.block.14': 'cpu', 'decoder.block.15': 'cpu', 'decoder.block.16': 'cpu', 'decoder.block.17': 'cpu', 'decoder.block.18': 'cpu', 'decoder.block.19': 'cpu', 'decoder.block.20': 'cpu', 'decoder.block.21': 'cpu', 'decoder.block.22': 'cpu', 'decoder.block.23': 'cpu', 'decoder.block.24': 'cpu', 'decoder.block.25': 'cpu', 'decoder.block.26': 'cpu', 'decoder.block.27': 'cpu', 'decoder.block.28': 'cpu', 'decoder.block.29': 'cpu', 'decoder.block.30': 'cpu', 'decoder.block.31': 'cpu', 'decoder.final_layer_norm': 'cpu', 'decoder.dropout': 'cpu'}

Canoot · March 7, 2023, 8:13pm

I don’t know where I got the parameter from, but since I don’t have any PyTorch state dict file, using from_pt=True in the save method call probably does not make much sense. At another save attempt, I noticed the warning

UserWarning: You are calling `save_pretrained` to a 8-bit converted model you may likely encounter unexepected behaviors.

So I guess therein lies the culprit. After downloading the weights anew, inference provides the expected results, and calling

print(model.hf_device_map)

now shows

{'': 0}

Canoot · March 8, 2023, 8:20pm

Confirming that the model works after using argument

torch_dtype=torch.bfloat16

in the

from_pretrained

method. Using a Colab Pro notebook, with 40 GB GPU memory.

Topic		Replies	Views
Model saved into an unique .h5 file (or TensorflowLight) 🤗Transformers	5	6207	July 27, 2022
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1492	November 27, 2023
How to save model in Colab during TPU training with Accelerate Intermediate	2	1385	November 19, 2021
Uploading and Download Model Errors Beginners	0	262	July 14, 2023
Google/flan-ul2 Models	1	236	July 3, 2023

Flan-UL2, must convert to save model?

Related topics