Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error

malterei · July 21, 2023, 6:08pm

Missing links:

Jorgeutd · July 21, 2023, 6:51pm

@malterei My issue was the with falcon model:

model_id = “tiiuae/falcon-40b” # sharded weights

So just to clarify the current DLC does not support this model, just the 7b model?

Thank you.

malterei · July 21, 2023, 10:02pm

I didn’t get 7b working with TGI container image 0.8.2 (763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.0-tgi0.8.2-gpu-py39-cu118-ubuntu20.04).

Building the latest TGI container image for SageMaker (GitHub - huggingface/text-generation-inference at v0.9.3) and the other instructions I described above was what makes 7b work for me.

I don’t have the time right now to train a 40b with my instructions.

If you have time maybe you can try my instructions with 7b or 40b to validate them?

GDoc · August 10, 2023, 12:31pm

Am i correct in saying that the current DLC does not support tiiuae/falcon-40b-instruct deployment, as the model weights are not in safetensor format?
I have the following error when trying to deploy the pre-trained model on SageMaker:

safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: “Read-only file system” })

I see that the workaround suggested above is to convert the Pytorch model weights to safetensor format during training but what is the current workaround for deploying as is?

philschmid · August 10, 2023, 12:43pm

Thats not correct the DLC support deploying Falcon see: Deploy Falcon 7B & 40B on Amazon SageMaker
But to make things easier having your weights in safetensors decreases the start up time.

GDoc · August 10, 2023, 1:21pm

The issue that i am facing is that i am trying to deploy the model on SageMaker within a VPC (with no access to public internet) and when deploying, i am unable to download the model from an S3 bucket to /opt/ml/model as the filesystem is read-only therefore, i am unable to convert the pytorch model weights to safetensor format. Can i deploy the model as is (i.e without converting weights during training as suggested)?

Note: When i say ‘as is’, the model.tar.gz file looks like this…

shekharchatterjee · August 23, 2023, 7:47am

@philschmid Any solution to this? Facing the same issue with deploying Fine-Tune L2-70b on g5.48xlarge gptq quantized.

Here’s the repo:
shekharchatterjee/temp-model-174 · Hugging Face

Error: DownloadError
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights
    utils.convert_files(local_pt_files, local_st_files)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files
    convert_file(pt_file, sf_file)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 52, in convert_file
    pt_state = torch.load(pt_file, map_location="cpu")
  File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)

philhd · September 7, 2023, 6:18am

I am stuck in the same situation. Did you get it solved?

GDoc · September 7, 2023, 7:52am

Yes, I have since converted the pytorch version of the model to Safetensor format using this:

GitHub - Silver267/pytorch-to-safetensor-converter: A simple converter which converts pytorch bin files to safetensor, intended to be used for LLM conversion.

rezadnayeri · January 8, 2024, 2:01pm

@philschmid @Jorgeutd
Hi guys, any solution for this issue?
I am facing the same issue when trying to deploy Mistral 7B. Training completes successfully, but the deployment gives this error: raise RuntimeError(f"weight {tensor_name} does not exist")

here is what I am using:
llm_image_uri_ver=“1.3.1”
llm_image = get_huggingface_llm_image_uri(
“huggingface”, # huggingface or lmi
version=llm_image_uri_ver,
session=Sagemaker_Session,
region=region_name
)
config = {
‘HF_MODEL_ID’: “/opt/ml/model”, # model_id from Models - Hugging Face
‘SM_NUM_GPUS’: json.dumps(number_of_gpu), # Number of GPU used per replica
‘MAX_INPUT_LENGTH’: json.dumps(MAX_INPUT_LENGTH), # Max length of input text
‘MAX_TOTAL_TOKENS’: json.dumps(MAX_TOTAL_TOKENS), # Max length of the generation (including input text)
‘MAX_BATCH_TOTAL_TOKENS’: json.dumps(MAX_BATCH_TOTAL_TOKENS), # Limits the number of tokens that can be processed in parallel during the generation
‘MAX_BATCH_PREFILL_TOKENS’: json.dumps(MAX_BATCH_PREFILL_TOKENS),
‘HUGGING_FACE_HUB_TOKEN’: HUGGING_FACE_HUB_TOKEN,
‘HF_TASK’:“text-classification”,
}
llm_model = HuggingFaceModel(
role=my_role,
image_uri=llm_image,
env=config,
sagemaker_session=Sagemaker_Session,
model_data=s3_train_model_path,
)

Topic		Replies	Views
QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image Amazon SageMaker	14	2990	August 18, 2023
Unable to deploy Falcon 40b OASST1 model into SageMaker TGI container Amazon SageMaker	0	433	July 29, 2023
Falcon 40B instruct training with QLora, Sagemaker model artifact location Amazon SageMaker	3	399	September 21, 2023
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4814	September 20, 2023
Mistral AI Sagemaker deployment failing Amazon SageMaker	3	2071	December 29, 2023

Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error

Related topics