Hey, I am currently trying to run inference on “huggyllama/llama-7b”. I am using the following minimal script: from transformers import pipeline checkpoint = "huggyllama/llama-7b" p = pipeline("text-generation", checkpoint, torch_dtype=torch.bfloat16, device_map="auto") print(p("hi there")) I r…

Multi-GPU inference with LLM produces gibberish

Buggod March 28, 2024, 7:36pm 11

Can any one teach me how to use 2 GPUs to run inference? Accelarator can’t detect my GPUs.

Topic		Replies	Views
Getting error when running inference in multiple GPUs 🤗Transformers	0	657	October 13, 2023
Does anyone have an idea how we can run llama2 with multiple GPUs? 🤗Transformers	1	1281	October 26, 2023
Multi gpu not working 🤗Transformers	2	2236	February 3, 2023
Tranier not starting on multi-GPU setting 🤗Transformers	1	1083	February 15, 2024
Multi-GPU inference with accelerate Beginners	0	1731	October 19, 2023