Hi team,
I was able to fine tune successfully the Falcon model following the instructions on this notebook:
Then I tried to deploy that trained model following what it was recommended on the next steps section as below using the new Hugging Face LLM Inference Container:
Check out the Deploy Falcon 7B & 40B on Amazon SageMaker and Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker for more details.
This was my deployment code:
import json
from sagemaker.huggingface import HuggingFaceModel
sagemaker config
instance_type = “ml.g5.12xlarge”
number_of_gpu = 4
health_check_timeout = 300
Define Model and Endpoint configuration parameter
config = {
‘HF_MODEL_ID’: “/opt/ml/model”, # path to where sagemaker stores the model
‘SM_NUM_GPUS’: json.dumps(number_of_gpu), # Number of GPU used per replica
‘MAX_INPUT_LENGTH’: json.dumps(1024), # Max length of input text
‘MAX_TOTAL_TOKENS’: json.dumps(2048), # Max length of the generation (including input text)
‘HF_MODEL_QUANTIZE’: “bitsandbytes”,# Comment in to quantize
}
create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
model_data=s3_model_uri,
env=config
)
I got the following error on the logs:
message
#033[2m2023-07-14T14:28:45.972834Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /opt/ml/model/pytorch_model-00001-of-00009.bin to /opt/ml/model/model-00001-of-00009.safetensors.
Error: DownloadError
“#033[2m2023-07-14T14:28:55.491641Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download encountered an error: Traceback (most recent call last):
File “”/opt/conda/bin/text-generation-server”“, line 8, in
sys.exit(app())
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
“#033[2m2023-07-14T14:28:57.416297Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: “”/opt/ml/model”“, revision: None, sharded: None, num_shard: Some(4), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: “”/tmp/text-generation-server”“, master_addr: ““localhost””, master_port: 29500, huggingface_hub_cache: Some(”“/tmp”“), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, env: false }”
#033[2m2023-07-14T14:28:57.416332Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Sharding model on 4 processes
#033[2m2023-07-14T14:28:57.416401Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-07-14T14:29:00.968073Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model /opt/ml/model at revision None. Converting PyTorch weights to safetensors.
#033[2m2023-07-14T14:29:00.968114Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /opt/ml/model/pytorch_model-00001-of-00009.bin to /opt/ml/model/model-00001-of-00009.safetensors.
“#033[2m2023-07-14T14:29:10.426801Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download encountered an error: Traceback (most recent call last):
File “”/opt/conda/bin/text-generation-server”“, line 8, in
sys.exit(app())
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
Error: DownloadError
“#033[2m2023-07-14T14:29:12.416639Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: “”/opt/ml/model”“, revision: None, sharded: None, num_shard: Some(4), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: “”/tmp/text-generation-server”“, master_addr: ““localhost””, master_port: 29500, huggingface_hub_cache: Some(”“/tmp”“), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, env: false }”
#033[2m2023-07-14T14:29:12.416671Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Sharding model on 4 processes
#033[2m2023-07-14T14:29:12.416743Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-07-14T14:29:15.984028Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model /opt/ml/model at revision None. Converting PyTorch weights to safetensors.
#033[2m2023-07-14T14:29:15.984072Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /opt/ml/model/pytorch_model-00001-of-00009.bin to /opt/ml/model/model-00001-of-00009.safetensors.
“#033[2m2023-07-14T14:29:25.527784Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download encountered an error: Traceback (most recent call last):
File “”/opt/conda/bin/text-generation-server”“, line 8, in
sys.exit(app())”
“Error: DownloadError
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
“#033[2m2023-07-14T14:29:27.432427Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: “”/opt/ml/model”“, revision: None, sharded: None, num_shard: Some(4), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: “”/tmp/text-generation-server”“, master_addr: ““localhost””, master_port: 29500, huggingface_hub_cache: Some(”“/tmp”“), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, env: false }”
#033[2m2023-07-14T14:29:27.432464Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Sharding model on 4 processes
#033[2m2023-07-14T14:29:27.432533Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-07-14T14:29:30.973642Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model /opt/ml/model at revision None. Converting PyTorch weights to safetensors.
#033[2m2023-07-14T14:29:30.973765Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /opt/ml/model/pytorch_model-00001-of-00009.bin to /opt/ml/model/model-00001-of-00009.safetensors.
“#033[2m2023-07-14T14:29:40.442511Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download encountered an error: Traceback (most recent call last):
File “”/opt/conda/bin/text-generation-server”“, line 8, in
sys.exit(app())
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files”
“Error: DownloadError
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
“#033[2m2023-07-14T14:29:42.400590Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: “”/opt/ml/model”“, revision: None, sharded: None, num_shard: Some(4), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: “”/tmp/text-generation-server”“, master_addr: ““localhost””, master_port: 29500, huggingface_hub_cache: Some(”“/tmp”“), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, env: false }”
#033[2m2023-07-14T14:29:42.400620Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Sharding model on 4 processes
#033[2m2023-07-14T14:29:42.400689Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-07-14T14:29:45.974179Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model /opt/ml/model at revision None. Converting PyTorch weights to safetensors.
#033[2m2023-07-14T14:29:45.974317Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /opt/ml/model/pytorch_model-00001-of-00009.bin to /opt/ml/model/model-00001-of-00009.safetensors.
“#033[2m2023-07-14T14:29:55.512021Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download encountered an error: Traceback (most recent call last):
File “”/opt/conda/bin/text-generation-server”“, line 8, in
sys.exit(app())
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)”
“Error: DownloadError
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
“#033[2m2023-07-14T14:29:57.412895Z#033[0m #033[32m INFO#033[0m #
sys.exit(app())
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”“, line 151, in download_weights
utils.convert_files(local_pt_files, local_st_files)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 84, in convert_files
convert_file(pt_file, sf_file)
File “”/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”“, line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={”“format”“: ““pt””})
File “”/opt/conda/lib/python3.9/site-packages/safetensors/torch.py”“, line 232, in save_file”
“Error: DownloadError
serialize_file(_flatten(tensors), filename, metadata=metadata)”
“safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 30, kind: ReadOnlyFilesystem, message: ““Read-only file system”” })”
“#033[2m2023-07-14T14:34:57.372838Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: “”/opt/ml/model”“, revision: None, sharded: None, num_shard: Some(4), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: “”/tmp/text-generation-server”“, master_addr: ““localhost””, master_port: 29500, huggingface_hub_cache: Some(”“/tmp”“), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, env: false }”
#033[2m2023-07-14T14:34:57.372870Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Sharding model on 4 processes
#033[2m2023-07-14T14:34:57.372945Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-07-14T14:35:00.966247Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model /opt/ml/model at revision None. Converting PyTorch weights to safetensors.
Any advice to fix this? Thank you.