Missing links:
@malterei My issue was the with falcon model:
model_id = âtiiuae/falcon-40bâ # sharded weights
So just to clarify the current DLC does not support this model, just the 7b model?
Thank you.
I didnât get 7b working with TGI container image 0.8.2 (763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.0-tgi0.8.2-gpu-py39-cu118-ubuntu20.04).
Building the latest TGI container image for SageMaker (GitHub - huggingface/text-generation-inference at v0.9.3) and the other instructions I described above was what makes 7b work for me.
I donât have the time right now to train a 40b with my instructions.
If you have time maybe you can try my instructions with 7b or 40b to validate them?
Am i correct in saying that the current DLC does not support tiiuae/falcon-40b-instruct deployment, as the model weights are not in safetensor format?
I have the following error when trying to deploy the pre-trained model on SageMaker:
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: âRead-only file systemâ })
I see that the workaround suggested above is to convert the Pytorch model weights to safetensor format during training but what is the current workaround for deploying as is?
Thats not correct the DLC support deploying Falcon see: Deploy Falcon 7B & 40B on Amazon SageMaker
But to make things easier having your weights in safetensors decreases the start up time.
The issue that i am facing is that i am trying to deploy the model on SageMaker within a VPC (with no access to public internet) and when deploying, i am unable to download the model from an S3 bucket to /opt/ml/model as the filesystem is read-only therefore, i am unable to convert the pytorch model weights to safetensor format. Can i deploy the model as is (i.e without converting weights during training as suggested)?
Note: When i say âas isâ, the model.tar.gz file looks like thisâŚ
@philschmid Any solution to this? Facing the same issue with deploying Fine-Tune L2-70b on g5.48xlarge gptq quantized.
Hereâs the repo:
shekharchatterjee/temp-model-174 ¡ Hugging Face
Error: DownloadError
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files
convert_file(pt_file, sf_file)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 52, in convert_file
pt_state = torch.load(pt_file, map_location="cpu")
File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
I am stuck in the same situation. Did you get it solved?
Yes, I have since converted the pytorch version of the model to Safetensor format using this:
@philschmid @Jorgeutd
Hi guys, any solution for this issue?
I am facing the same issue when trying to deploy Mistral 7B. Training completes successfully, but the deployment gives this error: raise RuntimeError(f"weight {tensor_name} does not exist")
here is what I am using:
llm_image_uri_ver=â1.3.1â
llm_image = get_huggingface_llm_image_uri(
âhuggingfaceâ, # huggingface or lmi
version=llm_image_uri_ver,
session=Sagemaker_Session,
region=region_name
)
config = {
âHF_MODEL_IDâ: â/opt/ml/modelâ, # model_id from Models - Hugging Face
âSM_NUM_GPUSâ: json.dumps(number_of_gpu), # Number of GPU used per replica
âMAX_INPUT_LENGTHâ: json.dumps(MAX_INPUT_LENGTH), # Max length of input text
âMAX_TOTAL_TOKENSâ: json.dumps(MAX_TOTAL_TOKENS), # Max length of the generation (including input text)
âMAX_BATCH_TOTAL_TOKENSâ: json.dumps(MAX_BATCH_TOTAL_TOKENS), # Limits the number of tokens that can be processed in parallel during the generation
âMAX_BATCH_PREFILL_TOKENSâ: json.dumps(MAX_BATCH_PREFILL_TOKENS),
âHUGGING_FACE_HUB_TOKENâ: HUGGING_FACE_HUB_TOKEN,
âHF_TASKâ:âtext-classificationâ,
}
llm_model = HuggingFaceModel(
role=my_role,
image_uri=llm_image,
env=config,
sagemaker_session=Sagemaker_Session,
model_data=s3_train_model_path,
)