Hi, I want to train a model with >100GB, and it will OOM if I load using from_pretrained. What’s the suggestion of loading and saving 100GB models. For training, I can use FSDP to distribute weights across devices but I am stuck in model loading and saving.
I find this article but it only supports inference.
@maxBing12345 did you find any solution?
Not sure if this can be done because I never tried this, but can you push it to the hub? Then you can just load/save from it
You can take a look at this repo for big models loading GitHub - huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
For saving, you can use save_pretrained and set push_to_hub = True to push to HF hub, you can also set max_shard_size to shard the big models into smaller files.