Issues while tyring to run GPT-J in local

Hello! I’m a CS student. I’m also a junior developer (mostly skilled in backend, but I can also manage some full-stack tasks) and I’m currently choosing an argument for my thesis. It will be on ML, and currently I’ve found GPT-J (and GPT-3, but that’s not the topic) really fascinating. I’m trying to move the text generation in my local computer, but my ML experience is really basic with classifiers and I’m having issues trying to run GPT-J 6B model on local. This might also be caused due to my medium-low specs PC (GPU is an AMD rx 480 4GB, 16GB ram, CPU AMD Ryzen 5 3600 6-Core Processor 3.60 GHz and as I’ve read online I might not even be able to run GPT-J locally).
So, my first question is: can I actually run GPT-J in local? Even if it’s slow, speed is not currently my goal. Yes, I also considered using Colab, but for now I need to run it even without internet, so in local.
Second question (supposing that the first one is a “Yes, you can run it in local”):
I’m trying to run it with this test code:

from transformers import GPTJForCausalLM, AutoTokenizer
import torch    

model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
context = """In a shocking finding, scientists discovered a herd of unicorns living in a remote, 
            previously unexplored valley, in the Andes Mountains. Even more surprising to the 
            researchers was the fact that the unicorns spoke perfect English."""

input_ids = tokenizer(context, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(gen_text)

It downloaded the 6B model, but it got stuck on the generating function. This seems like a coding error, but it’s actually weird because it’s not in my code. Error is:
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’
Also tried to run it with tensorflow:

from transformers import GPTJForCausalLM, AutoTokenizer
import tensorflow as tf

model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=tf.float16, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
context = """In a shocking finding, scientists discovered a herd of unicorns living in a remote, 
          previously unexplored valley, in the Andes Mountains. Even more surprising to the 
          researchers was the fact that the unicorns spoke perfect English."""

input_ids = tokenizer(context, return_tensors="pt").input_ids
print ("Generating...")
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(gen_text)

Got a different error:

File “filePath\venv\lib\site-packages\transformers\modeling_utils.py”, line 1044, in _set_default_torch_dtype

  • if not dtype.is_floating_point:*
    AttributeError: ‘DType’ object has no attribute ‘is_floating_point’

I don’t actually know where to start using gpt-j in local, because I’ve tried a bunch of online guides (also huggingface’s example code) but it still gives me some errors :frowning:
Thanks anyway!

On your local machine, do you have a GPU and CUDA installed? The error you mention for the first snippet leads me to believe you either don’t have a GPU or the model/inputs are not on the GPU

No, as I said I have an AMD GPU so no CUDA for me. Is it possible to run it without CUDA? Some ML packages (e.g. Tensorflow) give me the same warning about CUDA, but it still runs smothly with my GPU.

ah, so you will need to remove revision='float16', torch_dtype=torch.float16 then, and it should work

Thanks! Do I have to just completely remove them and do nothing or should I add something more?
Another question: my GPU is pretty old, how much time could it take? It’s an RX 480 4GB. Minutes? Hours?

I tried to run it without float16 parameter, I got that error:
RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 268435456 bytes.

Any workaround?

Ah yea you’re running out of memory trying to load a 24GB checkpoint on 16GB RAM. I’m not sure if this blogpost would be able to solve your problem, but it might be helpful to you. Any thoughts @muellerzr ?

I fixed it! I accidentally deleted the low_cpu_memory_usage to true. Setting it back again fixed it. Anyway it took about 100 minutes to produce 30 characters :laughing: any way to speed that up?

1 Like

I’m afraid I’m not too sure, given your setup. Maybe someone else has ideas here :sweat_smile: Glad you got it to work though :slight_smile:

1 Like

Thanks anyway, you still helped me a lot :slight_smile: I’ll probably open another topic because I now have a lot of questions about the hardware and performance

1 Like

NP! feel free to mark my earlier comment as the solution :slight_smile:

It looks like you’re not using the GPU :thinking: I believe you should have ROCm installed, which is AMD’s equivalent to CUDA for deep learning. If you have it, rocm-smi shows the GPU usage in the same way that nvidia-smi does. According to this, rocm should support your hardware… just saying so that you can have more info before opening a new topic :slight_smile: