Need Help in creating ai chatbot for my app

ajaypillai · August 8, 2025, 8:39am

Yeah, yesterday I was working with Ollama as Ollama runs a **single inference session per model instance.**When multiple requests hit that same instance, Ollama queues them and processes one at a time — there’s no parallel token generation inside one model. So that’s the drawback of it. So I was thinking to run model locally using libraries like Transformers, vLLM, or Text Generation Inference (TGI).

John6666 · August 8, 2025, 9:51am

Oh. When handling data with long context lengths, TGI or vLLM are reliable and fast. Of course, there are no issues with quantization.
TGI is particularly good for load balancing.

Katharina112 · August 8, 2025, 10:27am

I agree focusing on open source based local AI is increasing privacy - if you don’t know this repo already, I find it useful and inspiring LM Studio · GitHub
TGI sounds like a good fit for your app don’t you think?

Topic		Replies	Views
Finetune a chatbot for specfic task Beginners	0	796	June 10, 2023
Conversational AI + question answering model Intermediate	5	2676	January 30, 2023
Answer template generation from question 🤗Transformers	0	213	November 11, 2023
FastLoRAChat Instruct-tune LLaMA on consumer hardware with shareGPT data Show and Tell	0	673	April 19, 2023
Mistral 7B RAG Langchaing Models	0	2639	February 20, 2024

Need Help in creating ai chatbot for my app

Related topics