I am currently working on an AI assistant which can open and close apps. Most of my code at the moment is AI corrected. However I mostly try to follow tutorials, right now I am looking for 2 things
1 what model should I be using, recently I have been running mistal 7b locally on a rtx 2060 however there is a lot of delay between input and a response, is there a better option I could be using
2 what TTS and speech recognition should I use for best results. I am looking to build this for free.
For Context on my programing level, I am finishing my last year of GCSE python
It’s a local LLM, but I think the 7B model is a little too big for 8GB to 12GB of 2060. I recommend a model of 3B or less in terms of VRAM size and speed. Also, I think it’s better to use Ollama because there are quirks in the quantization of the 20x0 generation. It’s fast, low memory, and easy. You can also use Llamacpp-python, but it’s a little complicated.
There are too many LLM models to say which is best, but for 3B, Llama 3.2 Instruct or Qwen 2.5 Instruct would be good.
Next, for ASR models, the Whisper series is the standard. The recently released Hugging Face FastRTC is probably the most efficient in the future, but there may still be some areas that are insufficient.
As for TTS, there are many, and the one that is suitable for each language changes, so it is good to look for something you like from Spaces.
Thank you so much, I have used Ollama to setup Mistral already. Will try some smaller models, is 3b parameters going to be enough to allow for a chatty assistant which needs to have certain responses to commands to allow for control of my laptop. E g when I ask to open an app, response should be ok opening -nameOfApp-
Oh, if you really only want the model to perform the traffic control actions of the agent, then this guy or Qwen 0.5B Instruct might be enough…
If you’re looking for speed, then you could also just look for a smaller model. Smallness is speed.
Oh sorry, didn’t mean just controlling the laptop I want it to work to talk but also have a couple of set responses for a type of command, so that I can talk to it like a regular chatbot which will have regular conversation and advice but have a couple of commands which it will have a set response
for my program to read and carry out