Following this blog post
I download the OPT175B model using
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)
I can see a 350GB file (it took quite sometime to download but that is okay) created after in the <path to direct>
.
However, upon restarting the session, two behaviors are observed
-
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="balanced_low_0", torch_dtype=torch.float16,cache_dir=<path to driec>)
quickly loads the model (i.e., does not redownload the 350gb file) since it is found in<path to driec>
. This is the desirable behavior - However, trying
generator = pipeline("text-generation", model="bigscience/bloom", cache_dir=<path to driec>,device_map="balanced_low_0", torch_dtype=torch.float16)
begins redownaloding the model and I am not sure why.
Moreover the files in <path to direc>
have blobs, refs, snapshots
in it but not a .json
file which probably what the pipeline()
in searching for.
My questions are
- how to make pipeline() use the downloaded model instead of re-downloading
- why isn’t from
_pretrained()
downloading the .json file?
One way I read was to use model.save_pretrained(<path>)
and then use pipeline(<path>)
to load but this takes an extra 350gb of space since model.save_pretrained(<path>)
will create a new model with .json
file in it.
Any suggestions?