I am trying to train a GPT-2 large language model using Deepspeed plugin of accelerate. Every time when the script calls this line,
loss = model(torch.stack(batch['input_ids']).T,labels=torch.stack(batch['input_ids']).T).loss
, I get the following error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding) inputs_embeds = self.wte(input_ids)
.
I also checked the data type of torch.stack(batch['input_ids']).T
. It is torch.Long.
For reference I am providing my accelerate config file parameters and a more descriptive error stacktrace.
Configfile:
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero_stage: 2
distributed_type: DEEPSPEED
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
use_cpu: false
Detailed error:
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
transformer_outputs = self.transformer(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
return forward_call(*input, **kwargs)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 843, in forward
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeErrorreturn torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse):
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
inputs_embeds = self.wte(input_ids)
Hi @hsuyab thanks for replying. yes I tested the output by printing. It is torch.Long(). I think since it is ‘fp16’ deepseed is converting the input_ids to torch.cuda.half_tensor. I cannot share the code as it is my comapny’s internal code but what I can share is the output of ds_report. Is it possible for you to check whether this is okay:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-484d6d7c-f559-4582-aa77-2c164373abce/lib/python3.9/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-484d6d7c-f559-4582-aa77-2c164373abce/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.8.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
Having this exact problem. Were you able to solve?
I solved by changing accelerate config to not use fp16 (stage 2)
Yeah, it works by removing fp16, but then Out of Memory error start coming. FP16 would be required for larger batch sizes. I am also stuck and unable to find a solution. I will probably stop using accelerate and directly use DeepSpeed library with pytorch.
hi,i fixed this problem by change my deepspeed config file.
They key is set <auto_cast> to false, then you will find there no longer has the problem of “LongTensor be tranfered to HalfTensor”
1 Like
you can solve it in my answer