Using Accelerate with DeepSpeed for WNUT Example

larsbun · July 12, 2023, 9:34pm

I am trying to get DeepSpeed (DS) integration with Accelerate (ACC) to work for the token_classification example. The labels range from 0 to 12. However, the output when I run my code looks like this:

```…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0…/aten/src/ATen/native/cuda/Loss.cu,0,0:240], thread: [14: nll_loss_forward_reduce_cuda_kernel_2d,0: block: [0,0,0] Assertion t >= 0 && t < n_classes,0 failed.
``
Since I am using the DS integration, I do not know how to explicitly query the “t” which torch reports errors from, as I suppose there is some conversion under the hood here. Where does the n_classes come from, for example?

To test the offloading feature, I needed a model too big for the GPU: model_name = "bigscience/bloomz-7b1"

Some relevant environment information:
transformers 4.31.0.dev0 deepspeed 0.9.5 accelerate 0.20.3 huggingface-hub 0.16.4

and

[2023-07-12 23:32:55,694] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/envs/deepspeed_env/lib/python3.8/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/envs/deepspeed_env/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.5, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 12.2
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0

smangrul · July 19, 2023, 7:05pm

Hello, please share a minimal script to run along with the launch command

Topic		Replies	Views
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 🤗Accelerate	1	772	May 31, 2024
Getting torch.cuda.halfTensor error while using DeepSpeed with accelerate 🤗Accelerate	8	3431	June 23, 2023
Exact difference between Transformers' and Accelerate's DeepSpeed integrations? DeepSpeed	5	836	February 13, 2024
Using deepspeed script launcher vs accelerate script launcher for TRL 🤗Accelerate	4	1983	January 24, 2024
HF accelerate DeepSpeed plugin does not use custom optimizer or scheduler 🤗Accelerate	2	41	March 1, 2025

Using Accelerate with DeepSpeed for WNUT Example

Related topics