Hello,
You might consider loading it in a quantized version or a smaller one. A 34 Bn parameters model does not fit in a single H100 GPU.
Using “load_in_8_bits = True” in the AutoModelForCausalLM.from_pretrained might be a good start even though I think it would still fail at generation.
I advise you to start with a 7B one which still does a fantastic job !
Hope this helps !