I have written a training script that makes use of the Accelerate and PEFT libraries to finetune GPT-NeoX and repeatedly encounter the following two messages resulting in a runtime error.
The first message is:
/opt/conda/envs/accelerate/lib/python3.7/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
and the second is:
File "/opt/conda/envs/accelerate/lib/python3.7/site-packages/torch/autograd/__init__.py", line 199, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I use the following code except to load the model.
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules = ["query_key_value", "xxx"],
bias="none",
task_type="CAUSAL_LM",
)
model = AutoModelForCausalLM.from_pretrained(model_name)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
The terminal command I am executing is:
accelerate launch train.py --data_path_file ./prompts.jsonl -m EleutherAI/gpt-neox-20b -te 3 -lr 1.41e-5 --eval_size 0.1 --batch_size 7 --gradient_checkpointing False
Any tips on successfully backpropagating using LoRA would be appreciated!
Environment details
`(accelerate) root@de1305f1fa1f:/mnt/training# python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.1
Libc version: glibc-2.10
Python version: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-58-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: GRID A100D-7-80C
MIG 7g.80gb Device 0:
Nvidia driver version: 525.85.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.13.1
[conda] numpy 1.21.6 pypi_0 pypi
[conda] torch 1.13.1 pypi_0 pypi