Following this blog post
I download the OPT175B model using
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)
I can see a 350GB file (it took quite sometime to download but that is okay) created after in the <path to direct>.
However, upon restarting the session, two behaviors are observed
-
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)quickly loads the model (i.e., does not redownload the 350gb file) since it is found in<path to driec>. This is the desirable behavior - However, trying
generator = pipeline("text-generation", model="bigscience/bloom", cache_dir=<path to driec>,device_map="balanced_low_0", torch_dtype=torch.float16)begins redownaloding the model and I am not sure why.
Moreover the files in <path to direc> have blobs, refs, snapshots in it but not a .json file which probably what the pipeline() in searching for.
My questions are
- how to make pipeline() use the downloaded model instead of re-downloading
- why isn’t from
_pretrained()downloading the .json file?
One way I read was to use model.save_pretrained(<path>) and then use pipeline(<path>) to load but this takes an extra 350gb of space since model.save_pretrained(<path>) will create a new model with .json file in it.
Any suggestions?