AI disappointment: Why Llama 3.2 (3b version) loses out to Chat-GPT - An analysis of the limitations of Llama 3.2 (3b version) compared to Chat-GPT

DonatusOrth · December 4, 2024, 6:17pm

When using Llama 3.2 (3b version) and comparing it to chat-gpt, it just doesn’t measure up. Not only is it making a lot of grammatical errors, it is also not following instructions as in summarize this.

Llama 3.2 (3b version) is in love with self care. So much so that it recommends self-care when asking how to draw a circle. Chat-Gpt does not.

Chat-Gpt is hilarious at using sarcasm. I love to use “comment on this news article in the most sarcastic way”.

Llama 3.2 (3b version) … well at least it likes self care.

Llama 3.2 (3b version) stands for local, private, chatgpt for this will be used against you.

But Llama 3.2 (3b version) seems incredibly bad compared to chatgpt.

I would love to have an AI comment on my most private thoughts, but Llama 3.2 (3b version) would rather promote self-care, talking to others. And talking to a lawyer if your friend stops talking to you to see your legal options(it actually wrote that).

My computer has 12 GB of VRAM.

What could I do to have an AI with good output but running on those 12 GB - or in part on the 12 GB VRAM and the rest on 64 GB RAM.

enzomich · December 10, 2024, 1:06pm

So you expect that a 3B perform like ChatGPT, which is based on a hugely larger model (in the region of 1700B)?

DonatusOrth · December 10, 2024, 2:37pm

Thank you for pushing the thread.

So the question is: What could I do to have an AI with good output but running on those 12 GB - or in part on the 12 GB VRAM and the rest on 64 GB RAM.

enzomich · December 10, 2024, 3:38pm

With a 12 Gb VRAM and 64 Gb RAM you may easily run more capable models, especially if you use quantized models, which take much less space than the original ones as they encode parameters using 4 to 8 bit per parameter rather than 16. To make your life easier, I suggest you to install Ollama (www.ollama.com) and use it to download various models to compare. You may get them from https://ollama.com/library : I recommend the standard quantization, which in these days is Q4_K_M and takes around 5 bits per weight.
You may start with 7B to 22B models, which should fit entirely in VRAM and therefore run fast. For example, after Ollama is installed, you may load and run Qwen2.5 14B with the command:

ollama run qwen2.5:14b

(this command line is copied to your clipboard when your browser shows the page at https://ollama.com/library/qwen2.5:14b and you click on the button on the right of the size selector).
After the loading is completed you’ll see a >>> prompt and you’ll be able to start a dialogue.

With all that VRAM and RAM that you’ve got, you might be able to run the excellent Llama3.3-70B:
https://ollama.com/library/llama3.3:70b
Load and run with: ollama run llama3.3
(after the initial loading, subsequent executions of the “ollama run <model_name>” command will just run it). If you want to test its capabilities before loading it, you may access it hosted here, at www.huggingface.co/chat .

For other commands to Ollama, enter ollama help.

Have fun!

DonatusOrth · December 12, 2024, 1:52am

That was elaborate
I am already using Ollama.

enzomich · December 12, 2024, 4:08am

Great, then let me know how the other models work for you. On my laptop with 4070 GPU, 8 Gb RAM and 32 Gb RAM I managed to run the Q_2 quantization of Llama3.3 70B: it’s much slower than on HuggingChat (1 or 2 tokens per second) but the quality is still good despite the extreme quantization.

For fun, you may want to try “Chain-of-Thoughts” models such as Marco-o1 or QwQ, both available in the Ollama library. They give you a fascinating insight of those model’s reasoning.

Topic		Replies	Views
How to get Llama-2-13b-chat-hf to ACTUALLY RUN Beginners	0	254	May 30, 2024
Meta-Llama-3.1-70B-Instruct-IMat-GGUF Beginners	0	145	July 24, 2024
Hardware Requirement GPU Beginners	3	1189	January 27, 2025
Llama 3.1 8b Instruct - Memory Usage More than Reported Models	5	471	February 18, 2025
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out Beginners	9	1356	October 31, 2024

AI disappointment: Why Llama 3.2 (3b version) loses out to Chat-GPT - An analysis of the limitations of Llama 3.2 (3b version) compared to Chat-GPT

Related topics