How to get Llama-2-13b-chat-hf to ACTUALLY RUN

Has someone got a complete, fully specified and working ‘recipe’/instructions for how/where to set a large llama model up that will just work?

I am sure thousands of people have done this.

I have been trying for many, many days now to just get Llama-2-13b-chat-hf to run at all.
I have even hired a consultant, who has also spent a lot of time and so far failed.
The problem has been getting a machine with the right resources, environment, configuration, etc.
We have tried a lot of different approaches:

  1. Local machine. Result: not enough GPU memory, even on Mac Studio.
  2. Google Colab Pro, even on A100. Result: not enough GPU memory, various other problems.
  3. Google Cloud Virtual Machine. Result: after many, many attempts have never gotten a version with the right boot disk, OS, environment, GPU, drivers, packages/installed software, that will actually run the model. Have spent literally days trying to guess which combination will work, and certainly tried the basics like using the ‘Deep Learning’ VM.

Thank you for your help.