Using Accelerate with DeepSpeed for WNUT Example

I am trying to get DeepSpeed (DS) integration with Accelerate (ACC) to work for the token_classification example. The labels range from 0 to 12. However, the output when I run my code looks like this:

```…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0…/aten/src/ATen/native/cuda/Loss.cu,0,0:240], thread: [14: nll_loss_forward_reduce_cuda_kernel_2d,0: block: [0,0,0] Assertion t >= 0 && t < n_classes,0 failed.
``
Since I am using the DS integration, I do not know how to explicitly query the “t” which torch reports errors from, as I suppose there is some conversion under the hood here. Where does the n_classes come from, for example?

To test the offloading feature, I needed a model too big for the GPU: model_name = "bigscience/bloomz-7b1"

Some relevant environment information:
transformers 4.31.0.dev0 deepspeed 0.9.5 accelerate 0.20.3 huggingface-hub 0.16.4

and

[2023-07-12 23:32:55,694] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/envs/deepspeed_env/lib/python3.8/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/envs/deepspeed_env/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.5, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 12.2
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0

Hello, please share a minimal script to run along with the launch command