Morning, quick Q… Why would a models training raise from ~111hrs to 400hrs when I changed the model to load in 8 bit, instead of loading model straight from the hub(i guess 32bit originally)?
originally model was loaded as t5-xl, and i changed to (t5-xl, load_in_8bit…), for less of a strain on gpu memory, but the training time has gone up?
Sort of sounds correct when writing it out loud, but just for clarity is that normal?
Thanks.