Capabilities of an rtx 3070

Hello! I am running a laptop with an 11th Gen Intel Corei7-11800H 2.30GHz, 2304 Mhz, 8 Cores, 16 Logical Processors and an RTX 3070 8gb VRAM overclocked to 1700mhz. I can run 4bit quantized 7b parameter models (wizardlm, vicuna) with autogptq or gptqforllama very fast (>15 tokens/s) but when I go to 13b versions of the same models it crawls below 1 token per second. Am I reaching the limits of what a 3070 can handle? Or am I misconfiguring and should look for solutions? I just want to know so I’m not troubleshooting for no reason. I am running on webgui and TheBloke’s 4bit quantized models. Thank you in advance for any answers