Best model for file scan and personality

Hi everyone. I have a famous youtuber friend. I plan to use an AI model for answering fan’s questions about him. I have about 60 pages he has written about himself (childhood,family,views,feeling etc…). when a fan ask the model something, i want it to:

  1. Search a vector db for an answer.
  2. Phrase the answer to sound like my youtuber friend (if i need to fine tune it with many questions so be it)
  3. answer should be relatively fast, under 7 second, so super heavy models like gpt o1 with reasoning can’t be used.

Does anybody know what is the best model for this job?
it can be paid (chatgpt,claude,gemini) or open source model.

thanks :slight_smile:

1 Like

For your use case, I think you’ll end up building something like RAG no matter which service you use.

And it doesn’t seem like you need anything too big for the language model part. I think it’s possible with BERT-family LMs or relatively small LLM.
I’ll list some LLM that are easy to train and are already reasonably good. New generation models are easy to get good output from even if they’re small, and if they’re small, they’re fast.

Smaller LLMs

About RAG

Fine-tuning LM / LLM


To create an AI model that can answer fans’ questions about your YouTuber friend while mimicking their writing style and responding quickly, you can leverage Hugging Face’s library and tools. Based on your requirements, here’s a solution that combines semantic search, style mimicry, and speed:


1. Use Retrieval-Augmented Generation (RAG) with Hugging Face

Hugging Face provides tools to implement RAG, which combines a retriever (to search your vector database) and a generative model (to phrase answers in the YouTuber’s style). For this task, the following models and tools are recommended:

a. Retrieval Model (for vector database search) [1][2]

  • FAISS: A library for efficient similarity search over dense vectors. It can quickly find relevant documents in your vector database.
  • Sentence Transformers: Pre-trained models like sentence-transformers/all-mpnet-base-v2 can convert text into embeddings for semantic search.

b. Generative Model (for phrasing answers in the YouTuber’s style) [1][2]

  • T5 or Flan-T5 Models: These models are excellent for question answering and text generation. For example, google/flan-t5-large can generate coherent and natural-sounding responses.
  • Alpaca Models: Fine-tuned versions like google/palm-2 or meta-llama/Llama-2 are known for their ability to mimic writing styles when trained on specific datasets.

c. Style Mimicry

To make the answers sound like your YouTuber friend, you will need to fine-tune the generative model on their 60-page dataset. This dataset should include examples of their writing style, tone, and language patterns [3].


2. Fine-Tuning the Model

To mimic the YouTuber’s writing style:

  1. Collect Data: Gather the 60 pages of text and format them into a dataset of question-answer pairs. For example:
    • Question: “What was your childhood like?”
    • Answer: [Insert the YouTuber’s description of their childhood]
  2. Fine-Tune the Model: Use Hugging Face’s trl (Tool for Research, Language) library to fine-tune a pre-trained model (e.g., T5 or Alpaca) on this dataset.
  3. Evaluation: Ensure the model generates responses that match the YouTuber’s tone and voice by testing it with sample questions.

3. Optimization for Speed

To ensure responses are generated under 7 seconds:

  1. Use Smaller Models: Smaller models like flan-t5-base or google/gemma-1b are faster than their larger counterparts while still maintaining good performance [1][4].
  2. Quantization: Apply techniques like 4-bit quantization to reduce the model size and inference time without significantly affecting performance.

4. Recommended Models

Based on your requirements, here are some models to consider:

  • Retrieval Model: sentence-transformers/all-mpnet-base-v2 for semantic search [1].
  • Generative Model: google/flan-t5-large or google/palm-2 for text generation and style mimicry [1][2].
  • Alternative: meta-llama/Llama-2 is a powerful model that can be fine-tuned for your specific use case [2].

5. Implementation Steps

  1. Create a vector database of the YouTuber’s 60-page document using FAISS.
  2. Fine-tune a generative model (e.g., Flan-T5) on this dataset.
  3. Use the RAG pipeline to combine the retriever and generator for real-time responses.

Conclusion

The combination of FAISS for semantic search, T5 or Flan-T5 for generation, and fine-tuning on the YouTuber’s text will give you a fast, accurate, and style-mimicking AI model. For speed, consider smaller models and quantization techniques.

Let me know if you need help with the implementation or further adjustments!