Getting torch.cuda.halfTensor error while using DeepSpeed with accelerate

sgauravm · May 15, 2023, 9:05pm

I am trying to train a GPT-2 large language model using Deepspeed plugin of accelerate. Every time when the script calls this line,
loss = model(torch.stack(batch['input_ids']).T,labels=torch.stack(batch['input_ids']).T).loss , I get the following error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding) inputs_embeds = self.wte(input_ids).
I also checked the data type of torch.stack(batch['input_ids']).T. It is torch.Long.

For reference I am providing my accelerate config file parameters and a more descriptive error stacktrace.
Configfile:

deepspeed_config:

gradient_accumulation_steps: 1

gradient_clipping: 1.0

offload_optimizer_device: none

offload_param_device: none

zero3_init_flag: true

zero_stage: 2

distributed_type: DEEPSPEED

fsdp_config: {}

machine_rank: 0

main_process_ip: null

main_process_port: null

main_training_function: main

mixed_precision: fp16

num_machines: 1

num_processes: 4

use_cpu: false

Detailed error:

File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
transformer_outputs = self.transformer(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
return forward_call(*input, **kwargs)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-48d0be80-7221-441b-911a-c6bb2ef50dc5/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 843, in forward
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeErrorreturn torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse):
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
inputs_embeds = self.wte(input_ids)

hsuyab · May 16, 2023, 6:39pm

Can you share your code or some re-usable example to test this.
Also can you see what is being passed into your input when you are running by printing that.

sgauravm · May 16, 2023, 8:58pm

Hi @hsuyab thanks for replying. yes I tested the output by printing. It is torch.Long(). I think since it is ‘fp16’ deepseed is converting the input_ids to torch.cuda.half_tensor. I cannot share the code as it is my comapny’s internal code but what I can share is the output of ds_report. Is it possible for you to check whether this is okay:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-484d6d7c-f559-4582-aa77-2c164373abce/lib/python3.9/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-484d6d7c-f559-4582-aa77-2c164373abce/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.8.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

madscientist1224 · May 18, 2023, 7:18pm

Having this exact problem. Were you able to solve?

madscientist1224 · May 18, 2023, 7:38pm

I solved by changing accelerate config to not use fp16 (stage 2)

sgauravm · May 19, 2023, 1:03am

Yeah, it works by removing fp16, but then Out of Memory error start coming. FP16 would be required for larger batch sizes. I am also stuck and unable to find a solution. I will probably stop using accelerate and directly use DeepSpeed library with pytorch.

yuanye · June 8, 2023, 1:53am

hi，i fixed this problem by change my deepspeed config file.
They key is set <auto_cast> to false, then you will find there no longer has the problem of “LongTensor be tranfered to HalfTensor”

yuanye · June 8, 2023, 1:54am

you can solve it in my answer

smangrul · June 23, 2023, 9:47am

is this still an issue?

Topic		Replies	Views
[Deepspeed] ZeRO-Infinity integration released and config changes DeepSpeed	2	2317	April 28, 2021
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 🤗Accelerate	1	778	May 31, 2024
Error 'expected scalar type Half but found Float' 🧨 Diffusers	2	3933	November 8, 2022
Issues when using `accelerate` with `fp16` Intermediate	4	12223	January 22, 2024
Checkpoint breaks with deepspeed 🤗Transformers	6	3476	March 20, 2021

Getting torch.cuda.halfTensor error while using DeepSpeed with accelerate

Related topics