Hey all, I have started looking into AI and been inspired by creators such as vedal with Neuro-sama i know it would be imposible for me to make such a thing especially at this stage however I would like to make LLM which I can give a personality and/ or will learn/ evolve through the interactions similar to how Neuro-sama learns from her twitch chat, I feel like I could try to save the chat (conversation) history and use that for more fine tuning but I am not sure, I would also like to make it able to have more added to it as I lean more, like how Neuro gets more functions as she grows, for me this could possibly allow the AI be be more like an assistant and be able to open apps and read emails or just tease the user in different ways. If anyone has any thoughts on how Neuro works or how I could make something similar to this, that would be greatly appreciated
It’s a solution that brings together a huge number of programs. I think there are some good ones on the front-end side, but it looks like it’s going to be really tough to create the server side, especially the database and the automated training part…
It might even be tougher to operate than to create.
There are some good components now, but…
Faster STT
User Interface
Related thread: Streamer AI (Like Neuro-Sama)
by Hugging Chat: HuggingChat
Neuro-sama: An Overview of the AI VTuber System
Neuro-sama is an advanced artificial intelligence system designed as a VTuber and chatbot, operating on Twitch under the username “vedal987.” Created by developer Vedal, she leverages a large language model to generate human-like responses and engage with viewers through text-to-speech, featuring a distinctive high-pitched, childlike voice.
Key Features:
-
AI-Powered Interaction:
- Utilizes a large language model for dynamic, context-aware conversation with her audience.
- A separate AI model governs her in-game actions, enabling her to play video games effectively.
-
Multimodal Capabilities:
- Employs Unity for enhancing interactive capabilities and producing animations, contributing to her engaging stream presence.
-
Content and Engagement:
- Focuses on interacting with viewers, responding to questions, and delivering varied content such as gaming sessions, karaoke performances, and artistic endeavors.
- Gained popularity with over 500,000 subscribers by March 2025, showcasing her significant community impact.
-
Dedication and Media Recognition:
- Vedal has dedicated his full-time efforts to Neuro-sama’s development, reflecting her strategic importance.
-_featured in notable media outlets like Bloomberg, highlighting her innovative role in the VTuber community.
- Vedal has dedicated his full-time efforts to Neuro-sama’s development, reflecting her strategic importance.
In essence, Neuro-sama represents a sophisticated integration of AI technology, providing an interactive and entertaining experience for her audience across multiple platforms.
Creating a system similar to Neuro-sama using open-source AI models without Twitch involves several steps and considerations. Here’s a structured approach to achieve this:
System Overview:
-
Audio Capture and Conversion
- Use open-source tools like VALL-E for speech-to-text (STT) to convert the streamer’s voice into text.
- Implement a method to capture audio in real-time, possibly using a server setup to handle audio streams.
-
AI Processing
- Utilize large language models such as Llama-2 or Mistral (Mistral-7B-instruct-v0.1) to process the text input and generate a coherent response.
- Ensure the AI maintains context by storing recent interactions, using a buffer to cache previous conversations.
-
Text-to-Speech Conversion
- Convert the AI-generated response back to speech using ElevenLabs or VALL-E, ensuring natural and engaging output.
-
Integration and Real-Time Handling
- Develop a backend system using Flask to manage real-time data flow, handling audio input and output efficiently.
- Consider using Docker to containerize services for easier deployment on a remote server, leveraging cheaper hosting options like OVH or Hetzner.
-
User Interaction
- Implement a bot in a platform like Discord, using its API to handle voice commands and responses in real-time.
- Explore integrating the AI with Discord voice channels to listen and respond appropriately.
-
Hosting and Cost Management
- Set up a server to host the backend, optimizing for cost efficiency while ensuring sufficient resources for continuous AI operations.
- Monitor usage of paid services like ElevenLabs to stay within budget.
-
Legal Compliance
- Review and comply with regulations regarding AI use, including data privacy and intellectual property rights.
Implementation Steps:
-
Setup and Configuration
- Begin by setting up each component individually. Install and configure STT, text processing, and TTS tools.
- Test each part to ensure functionality before integration.
-
Real-Time Audio Handling
- Develop or use existing solutions to stream audio data to the server in real-time, possibly using WebSockets for efficient communication.
-
Bot Integration
- Create a Discord bot that captures voice channel audio, processes it through the AI system, and responds appropriately.
- Implement commands for initiating AI interactions, pausing/resuming, adjusting settings, and viewing status.
-
Context Management
- Develop a buffer system to retain recent interactions, allowing the AI to maintain coherent conversations.
-
Testing and Refinement
- Conduct extensive testing to identify and resolve issues, optimizing performance for real-time interaction.
- Gather feedback from users to refine responses and system behavior.
-
Deployment and Hosting
- Deploy the Dockerized application on a remote server, ensuring ongoing maintenance and updates.
- Monitor performance and usage metrics to optimize resource allocation and cost management.
Conclusion:
Building this system requires a combination of technical skills and careful planning. By breaking down the project into manageable components and systematically addressing each part, you can create an engaging and functional AI system tailored to your needs. Start with individual component testing, then integrate them into a cohesive system, ensuring real-time functionality and user engagement.