Running inference on tuned models across multiple gpus

Im having a tough time running my tuned model across multiple gpus

I have various pt files that i tuned with torchtune. I can inference with their generate function on lora but not full precision as one of my cards cant hold the whole model. Wondering the right approach to do this I have tried various methods but am struggling>

  • hf_model_0001_2.pt
  • hf_model_0002_2.pt
  • hf_model_0003_2.pt
  • hf_model_0004_2.pt