I have a limited computation ressources and I want to use the LLaMA-3 model. So to reduce memory usage and increase computation speed, I want to indicate that computations will use the minimum bit floating point precision.I want to know is it possible to use torch.float8 instead of torch.float16 in
compute_dtype = getattr(torch, "float16")