How to Configure LLaMA-3:8B on HuggingFace to Generate Responses Similar to Ollama?

The issue of different output when using pipelines and when using them locally or in Spaces is sometimes talked about.
While pipelines are easy to deal with, they seem to be working by supplementing a lot of information on their own, which means it is difficult to determine what is wrong with them.

I’m not sure about Ollama, but I think it would be better to use something other than pipelines to set chatbot-like parameters.
chat_completion, for example, is often used.
Also, I heard that Llama3 has some kind of bug, though I’m not sure what it has to do with this issue.

1 Like