Some questions about GPT-J inference using int8

Currently huggingface transformers support loading model into int8, which saves a lot GPU VRAM.

I’ve tried it in GPT-J, but found that the inference time comsume in int8 is much slower, about 8x more than in the normal float16.

Can somebody tell me why and how can I solve it?