Downloading and storing models

Following this blog post

I download the OPT175B model using

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)

I can see a 350GB file (it took quite sometime to download but that is okay) created after in the <path to direct>.

However, upon restarting the session, two behaviors are observed

  1. model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>) quickly loads the model (i.e., does not redownload the 350gb file) since it is found in <path to driec>. This is the desirable behavior
  2. However, trying generator = pipeline("text-generation", model="bigscience/bloom", cache_dir=<path to driec>,device_map="balanced_low_0", torch_dtype=torch.float16) begins redownaloding the model and I am not sure why.

Moreover the files in <path to direc> have blobs, refs, snapshots in it but not a .json file which probably what the pipeline() in searching for.

My questions are

  1. how to make pipeline() use the downloaded model instead of re-downloading
  2. why isn’t from_pretrained() downloading the .json file?

One way I read was to use model.save_pretrained(<path>) and then use pipeline(<path>) to load but this takes an extra 350gb of space since model.save_pretrained(<path>) will create a new model with .json file in it.

Any suggestions?

Oooh I think I figured it out.

I think the correct way to use pipeline() is to do something like the following -
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)
followed by
token = AutoTokenizer.from_pretrained("bigscience/bloom", cache_dir="/home/racball/opt175b_tokeniser")

Then,
generator = pipeline("text-generation", model=model,tokenizer=token)

3 Likes