Gpt-neo inference with Deepspeed: IndexError: Dimension out of range

Has anyone been able to get Deepspeed working for inference with GPT-Neo, on a finetuned model?

As per this GitHub issue:
https://github.com/Xirider/finetune-gpt2xl/issues/15

…I am finding that inference with Deepspeed works well on the un-finetuned model, “EleutherAI/gpt-neo-2.7B”
… But after I fine tune the model, inference with Deepspeed fails with this error message:

  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 374, in forward
    output = DeepSpeedSelfAttentionFunction.apply(
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 312, in forward
    output, key_layer, value_layer, context_layer = selfAttention_fp()
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 270, in selfAttention_fp
    qkv_out = qkv_func(input,
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)