Hello, I am looking for help to troubleshoot a deployment failure of the Falcon 40b model finetuned on the OpenAssistant dataset (OpenAssistant/falcon-40b-sft-top1-560)
I am following the instructions on Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker and I managed to convert the model .bin files into the safetensors, preserving all existing config files. The way I did was basically by loading the model locally (model = AutoModelForCausalLM.from_pretrained(...)
) and then doing model.save_pretrained(..., safe_serialization=True)
I assume the folder having all files (wtihout the .bin files) is correct, because I can load the model through AutoModelForCausalLM.from_pretrained(, use_safetensors=True) and run inference
I packed everything into a model.tar.gz file and uploaded into S3 and then requested the deployment, however I am getting this error. I wonder what I am doing wrong
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 209, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 161, in __init__
model=model.to(device),
File "/usr/src/transformers/src/transformers/modeling_utils.py", line 1903, in to
return super().to(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
>
NotImplementedError: Cannot copy out of meta tensor; no data!
Deployment code snippet:
config = {
'HF_MODEL_ID': "/opt/ml/model",
'HF_TASK':'text-generation',
'SM_NUM_GPUS': json.dumps(number_of_gpu),
'MAX_INPUT_LENGTH': json.dumps(1024),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}
llm_image = get_huggingface_llm_image_uri(
"huggingface",
version="0.8.2"
)
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
model_data=s3_model_uri,
env=config
)
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout,
)