Hello! I am running a laptop with an 11th Gen Intel Corei7-11800H 2.30GHz, 2304 Mhz, 8 Cores, 16 Logical Processors and an RTX 3070 8gb VRAM overclocked to 1700mhz. I can run 4bit quantized 7b parameter models (wizardlm, vicuna) with autogptq or gptqforllama very fast (>15 tokens/s) but when I go to 13b versions of the same models it crawls below 1 token per second. Am I reaching the limits of what a 3070 can handle? Or am I misconfiguring and should look for solutions? I just want to know so I’m not troubleshooting for no reason. I am running on webgui and TheBloke’s 4bit quantized models. Thank you in advance for any answers
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Performance with new NVIDIA RTX 30 series | 4 | 5744 | October 7, 2020 | |
List of AI Projects with AMD GPU support? | 0 | 650 | September 3, 2023 | |
Does the CPU speed matter when training using GPU | 1 | 185 | July 2, 2023 | |
More GPUs = lower performance? | 1 | 502 | December 31, 2020 | |
How long it takes to train Falcon 7B model using RTX 4090 GPU? | 3 | 3703 | February 18, 2024 |