Advice for locally run AI Assistant

Brakish · March 10, 2025, 10:40am

I am currently working on an AI assistant which can open and close apps. Most of my code at the moment is AI corrected. However I mostly try to follow tutorials, right now I am looking for 2 things
1 what model should I be using, recently I have been running mistal 7b locally on a rtx 2060 however there is a lot of delay between input and a response, is there a better option I could be using

2 what TTS and speech recognition should I use for best results. I am looking to build this for free.

For Context on my programing level, I am finishing my last year of GCSE python

John6666 · March 10, 2025, 1:57pm

It’s a local LLM, but I think the 7B model is a little too big for 8GB to 12GB of 2060. I recommend a model of 3B or less in terms of VRAM size and speed. Also, I think it’s better to use Ollama because there are quirks in the quantization of the 20x0 generation. It’s fast, low memory, and easy. You can also use Llamacpp-python, but it’s a little complicated.
There are too many LLM models to say which is best, but for 3B, Llama 3.2 Instruct or Qwen 2.5 Instruct would be good.

Next, for ASR models, the Whisper series is the standard. The recently released Hugging Face FastRTC is probably the most efficient in the future, but there may still be some areas that are insufficient.

As for TTS, there are many, and the one that is suitable for each language changes, so it is good to look for something you like from Spaces.

Brakish · March 10, 2025, 2:05pm

Thank you so much, I have used Ollama to setup Mistral already. Will try some smaller models, is 3b parameters going to be enough to allow for a chatty assistant which needs to have certain responses to commands to allow for control of my laptop. E g when I ask to open an app, response should be ok opening -nameOfApp-

John6666 · March 10, 2025, 2:20pm

Oh, if you really only want the model to perform the traffic control actions of the agent, then this guy or Qwen 0.5B Instruct might be enough…
If you’re looking for speed, then you could also just look for a smaller model. Smallness is speed.

Brakish · March 10, 2025, 2:50pm

Oh sorry, didn’t mean just controlling the laptop I want it to work to talk but also have a couple of set responses for a type of command, so that I can talk to it like a regular chatbot which will have regular conversation and advice but have a couple of commands which it will have a set response
for my program to read and carry out

John6666 · March 10, 2025, 3:24pm

I see. In that case, You’d want it to be at least 3B, or at most 1.5B. Without fine-tuning at 0.5B or less, the response is too inorganic…

system · March 11, 2025, 8:00am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	44	July 4, 2025
AI Voice Assistant Beginners	2	118	April 19, 2025
Best Local LLM for Real-Time Q&A on German/English Transcript? Models	1	44	June 19, 2025
Tips for a first `Hello World` terminal based chatbot w. HF APIs? Beginners	0	44	July 25, 2024
Help with starting to write a Casual Chatbot AI Beginners	5	1943	November 9, 2024

Advice for locally run AI Assistant

Related topics