Any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate?

Do you know of any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate?

Hey @vbachi , you can check this doc: Handling big models for inference

1 Like