Multi-GPU Operation mistralai/Mistral-Large-Instruct-2407

When I deployed Mistral-Large-Instruct-2407 on a multi-GPU server, I set GPU usage to “auto”, but the returned data was very slow. I wanted to try running my 8 A100 80Gb servers at full speed, but debugging multi-GPU settings, including workers, threads, GPU limits, etc., always resulted in GPU memory being fully occupied. Sometimes, when it worked, I encountered errors when child threads tried to use the model pre-loaded into GPU by the main thread. I implemented a ‘ swap’ solution, but it still tells me to use ‘ swap’ .

I haven’t seen any official sample code on Hugging Face or GitHub. I’m seeking guidance from everyone.