Enabling load_in_8bit makes inference much slower

me too,and training is slower too