Mistral 7B FineTuning with Interview Data

Hi!

I’ve been wanting to fine tune this model based on transcribed zoom interviews I have as training data.

How do I approach this problem?

I made a video regarding fine-tuning Mistral-7B on your own custom data.

One would need to prepare the data in the format of the UltraChat-200k dataset, which contains a list of messages per training example. Each message has either the role of “user” or “assistant” and contains the actual “content”. I explain all of that in the video.

2 Likes

Can you please provide me the dataset as I am planning to do Interviewer Bot as my final year project.
Thanks in advance.

Thanks a lot Niels. This is so so helpful. :smiling_face:
These are the main questions I have right now.

Regarding the format of the dataset, since the interviewer starts the conversation first, is it okay for the starting role in my “messages” column to be the “assistant” and not the “user”? As well as the chat_template?

My understanding also is that since one zoom transcribed full interview is the whole conversation, it should be put into one row for the message. For example, I have an interview with a back and forth between the interviewer and interviee around 200 times (meaning: 100 pairs of assistant & user). Is that right? Will the tokenizer be able to handle that? For context: I have around 7500 words per whole interview, that’s around ~9750 tokens assuming 1word ≈ 1.3tokens.

And I will have a total of 100 rows since I have 100 currently clean transcribed interviews?

How do I upload the adapter to huggingface? So I can just load it for later usage.

Is it possible to train Mistral 8x7b using that code as well? and GPU? Or do I need to change something?

I was thinking of deploying 4x RTX 4090 in runpod for the 8x7b, which equates to 96GB VRAM and 248 GB RAM, 64vCPU. Is that right? Or do I need like the H100?

Lastly, how do I deploy this to a chatUI in huggingface space? What GPU do I need for either the SFT model for mistral 7b, and the (soon) SFT model for mistral 8x7b?

Really appreciate your response! :slight_smile:

Thanks for watching!

And I will have a total of 100 rows since I have 100 currently clean transcribed interviews?

Yes, each training example could be one conversation (as long as it fits in the context window that you’re training, like 2048 tokens as in my video).

Is it possible to train Mistral 8x7b using that code as well? and GPU? Or do I need to change something?

Yes, the code is identical. However for multi-GPU training I’d recommend using Deepspeed or FSDP.

I was thinking of deploying 4x RTX 4090 in runpod for the 8x7b, which equates to 96GB VRAM and 248 GB RAM, 64vCPU. Is that right? Or do I need like the H100?

Yes it takes 96GB of RAM in float16, but you could use quantization techniques like BitsandBytes or AWQ etc. to shrink this down significantly (with 4-bits this becomes about 27 GB). For deployment I’d recommend taking a look at TGI, vLLM and TensorRT-LLM.

Lastly, how do I deploy this to a chatUI in huggingface space? What GPU do I need for either the SFT model for mistral 7b, and the (soon) SFT model for mistral 8x7b?

For Mistral-7B you require about 14GB of RAM to load the model in float16, hence a single RTX-4090 works since it has 24GB of RAM. ChatUI provides a Dockerfile, and Spaces supports Docker so you could deploy it this way.

1 Like