Thank you. Is there any other way to improve inference speed? I’m already using an H100 with 4-bit quantization and flash attention 2.
Thank you. Is there any other way to improve inference speed? I’m already using an H100 with 4-bit quantization and flash attention 2.