No module named 'deepspeed.checkpoint.utils'

I am doing my first try to fine tune falcon7,

import torch, einops
from datasets import load_dataset
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoTokenizer,
    TrainingArguments
)
from peft.tuners.lora import LoraLayer

from trl import SFTTrainer

but I got ModuleNotFoundError: No module named 'deepspeed.checkpoint.utils'

Im on aws sagemaker and using ml.c5.2x and 4xlarge. with torch 2 gpu optimised
can anybody please tell me what am i doing wrong?

Thanks

cc @smangrul

Hello, please install the latest version of DeepSpeed

Thank you, it is installed along with all this libraries:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git 
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U datasets
!pip install -q -U trl
!pip install -q -U einops
!pip install torch
!pip install torchvision
!pip install deepspeed

I also used p3 and g4 instances to make sure gpu is available.

Try doing -U, and let us know what your deepspeed version is

Thank you, that did the trick!
It is version 0.9.5 now. (was 0.6 before)
now I face another issue, telling me cannot import name ‘MODEL_FILE_PREFIX’ from ‘deepspeed.checkpoint.constants’

restart the kernel and it is resolved, thanks

1 Like