I read here that it is possible to load GPT-J to 12 GB card - Memory use of GPT-J-6B
However I tried but I could not even with recommended optimisation:
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True, cache_dir = "/root/Desktop/models_cache/")
model.half()
model.to("cuda")
I get out of memory error when I try to load.