Find LLM to run on single gpu with only 8 GB ram

I have a single nvidia gpu with 8 GB of ram. I’m running it on an ubuntu server 18.04 LTS. I’m able to pass queries and get response from flan-T5, but when I tried performing peft with lora I got a “gpu out of memory” error. Similarly I tried running camel-5b and llama2-7b-chat as chat agents, and both threw a “gpu out of memory error.” I’m trying to experiment with LLM, learn the structure of the code, prompt engineering. Ultimately I’d like to develop a chat agent with llama2-70b-chat even if I have to run it on colab. can anyone suggest a similar structure LLM to llama2-7b-chat that might be able to run on my single gpu with 8 gb ram?