What would be the minimum instance to deploy TheBloke/Phind-CodeLlama-34B-v2-GPTQ?

CarlosAndrea · December 18, 2023, 3:08pm

Hello all,

What would be the minimum instance to deploy TheBloke/Phind-CodeLlama-34B-v2-GPTQ ?
As it is GPTQ model I would expect it to fit in a ml.g5.xlarge with 24gbp of VRAM, but i’m getting OOM
I can inference on a local server with a 4090

CarlosAndrea · December 18, 2023, 4:17pm

Even using a ml.g4dn.12xlarge appears to not be sufficient, which is very surprising…

Topic		Replies	Views
Recommend an instance for MPT-7B and MPT-30B inference Amazon SageMaker	2	408	July 19, 2023
GPT-J fails on Amazon Sagemaker Models	2	1299	July 21, 2022
QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image Amazon SageMaker	14	2996	August 18, 2023
Larger instance types to do not reduce training time? Amazon SageMaker	2	1077	February 8, 2022
Deployment issue on Sagemaker Amazon SageMaker	16	3351	October 4, 2023

What would be the minimum instance to deploy TheBloke/Phind-CodeLlama-34B-v2-GPTQ?

Related topics