Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

makisa · March 25, 2025, 11:40am

Hi everyone!

I’m new to this, and I’m looking to use F5 TTS for around 140 hours of audio generation. Could someone help me figure out the best package to choose for this? I’m unsure about how this works and what would be the most cost-effective plan for me.

I’m not very familiar with services like this, so any advice or guidance would be much appreciated!

Thanks in advance!

John6666 · March 25, 2025, 2:37pm

As it is a very small model, it seems that as long as the hardware has around 2GB of VRAM, it should be fine. It should work with almost all GPU rental services…

The problem is how to generate the audio data (GUI? CLI? Self-made script?) and where to store the data for such a long time.

For example, if the generation method is fine with the Hugging Face GUI space and the storage destination is your hard disk, the following would be the cheapest option for a plan with no time limit.

GPU
|Hardware|CPU|Memory|GPU Memory|Disk|Hourly Price|
| — | — | — | — | — | — |
|Nvidia T4 - small|4 vCPU|15 GB|16 GB|50 GB|$0.40|

If you want to save your data online, you can use Hugging Face’s private model repository or dataset repository, which should be enough to store up to about 100GB.

Topic		Replies	Views
Compute VRAM size for Text2Text text generation 🤗Transformers	0	45	December 1, 2024
Russian ASR: Fine-tuning Wav2Vec2 Languages at Hugging Face	20	2697	May 22, 2021
Cuda out of memory issue training whisper model on single GPU Intermediate	0	907	December 15, 2023
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3758	September 18, 2023
How to predict the memory requirements for a given model? Models	0	744	June 9, 2022

Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

Related topics