Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

Hi everyone!

I’m new to this, and I’m looking to use F5 TTS for around 140 hours of audio generation. Could someone help me figure out the best package to choose for this? I’m unsure about how this works and what would be the most cost-effective plan for me.

I’m not very familiar with services like this, so any advice or guidance would be much appreciated!

Thanks in advance!

1 Like

As it is a very small model, it seems that as long as the hardware has around 2GB of VRAM, it should be fine. It should work with almost all GPU rental services…

The problem is how to generate the audio data (GUI? CLI? Self-made script?) and where to store the data for such a long time.

For example, if the generation method is fine with the Hugging Face GUI space and the storage destination is your hard disk, the following would be the cheapest option for a plan with no time limit.

GPU
|Hardware|CPU|Memory|GPU Memory|Disk|Hourly Price|
| — | — | — | — | — | — |
|Nvidia T4 - small|4 vCPU|15 GB|16 GB|50 GB|$0.40|

If you want to save your data online, you can use Hugging Face’s private model repository or dataset repository, which should be enough to store up to about 100GB.