Do you know of any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate?
Do you know of any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate?
Hey @vbachi , you can check this doc: Handling big models for inference