Hello, I am trying to maximize inference speed of a single prompt on a small (7B) model. I have a server with 4 GPUs. It seems possible to use accelerate to speed up inference. Does anyone have example code? I only see examples of splitting multiple prompts across GPUs but I only have 1 prompt at a time.
Thank you.