What would be the minimum instance to deploy TheBloke/Phind-CodeLlama-34B-v2-GPTQ?

Hello all,

What would be the minimum instance to deploy TheBloke/Phind-CodeLlama-34B-v2-GPTQ ?
As it is GPTQ model I would expect it to fit in a ml.g5.xlarge with 24gbp of VRAM, but i’m getting OOM
I can inference on a local server with a 4090

Even using a ml.g4dn.12xlarge appears to not be sufficient, which is very surprising…