Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error

I didn’t get 7b working with TGI container image 0.8.2 (763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.0-tgi0.8.2-gpu-py39-cu118-ubuntu20.04).

Building the latest TGI container image for SageMaker (GitHub - huggingface/text-generation-inference at v0.9.3) and the other instructions I described above was what makes 7b work for me.

I don’t have the time right now to train a 40b with my instructions.

If you have time maybe you can try my instructions with 7b or 40b to validate them?

1 Like