Due to various corporate network hoops, downloading models on the is not possible. Searching posts in this regard, I can see that others have downloaded and saved the models locally, but again, the initial download is a problem. At this point, I can see two options: git lfs to download the repo, and the huggingface_hub python library.
For the git lfs option, I think it downloads the whole repository (well, one commit deep). As an example, I’m looking at: sshleifer/distilbart-cnn-12-6 · Hugging Face . I see this has a pytorch model, a rust model, and a msgpack model - each over 1GB in size. I’m looking to use this for local development, as well as embedding in a container to run on servers (so as not to download on each run - which would be a massive waste). The git lfs option downloads the whole thing.
With the huggingface_hub library, I can select individual files. And this seems to work on our network. However, I’m a bit confused about which files I need to be able to load the model. We’re running with pytorch - so can I just download the pytorch bin model? Or would I need the msgpack and rust models too? Would I require the config.json, merges.txt, tokenizer_config.json, and vocab.json as well?