Parallelizing huggingface models

Hi,

I have 4 gpus each with 24gb of memory and would like to parallelize the gpt neo models (2.7B and 1.3B) and train them on text data I have. I would also like to parallelize and train models like llama that have 7 billion parameters. I am unsure of how to do this in deepspeed since there is a lot of information and not a lot of straightforward implementations. Please let me know of how this can be done.

Thanks