Multi-gpu inference

Thank you. Is there any other way to improve inference speed? I’m already using an H100 with 4-bit quantization and flash attention 2.