Im having a tough time running my tuned model across multiple gpus
I have various pt files that i tuned with torchtune. I can inference with their generate function on lora but not full precision as one of my cards cant hold the whole model. Wondering the right approach to do this I have tried various methods but am struggling>
- hf_model_0001_2.pt
- hf_model_0002_2.pt
- hf_model_0003_2.pt
- hf_model_0004_2.pt