Unable to deploy Falcon 40b OASST1 model into SageMaker TGI container

Hello, I am looking for help to troubleshoot a deployment failure of the Falcon 40b model finetuned on the OpenAssistant dataset (OpenAssistant/falcon-40b-sft-top1-560)

I am following the instructions on Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker and I managed to convert the model .bin files into the safetensors, preserving all existing config files. The way I did was basically by loading the model locally (model = AutoModelForCausalLM.from_pretrained(...)) and then doing model.save_pretrained(..., safe_serialization=True)

I assume the folder having all files (wtihout the .bin files) is correct, because I can load the model through AutoModelForCausalLM.from_pretrained(, use_safetensors=True) and run inference

I packed everything into a model.tar.gz file and uploaded into S3 and then requested the deployment, however I am getting this error. I wonder what I am doing wrong

> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 209, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 161, in __init__
  File "/usr/src/transformers/src/transformers/modeling_utils.py", line 1903, in to
    return super().to(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Deployment code snippet:

config = {
    'HF_MODEL_ID': "/opt/ml/model",
    'SM_NUM_GPUS': json.dumps(number_of_gpu),
    'MAX_INPUT_LENGTH': json.dumps(1024),
    'MAX_TOTAL_TOKENS': json.dumps(2048),

llm_image = get_huggingface_llm_image_uri(

llm_model = HuggingFaceModel(

llm = llm_model.deploy(
1 Like