Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

As it is a very small model, it seems that as long as the hardware has around 2GB of VRAM, it should be fine. It should work with almost all GPU rental services…

The problem is how to generate the audio data (GUI? CLI? Self-made script?) and where to store the data for such a long time.

For example, if the generation method is fine with the Hugging Face GUI space and the storage destination is your hard disk, the following would be the cheapest option for a plan with no time limit.

GPU
|Hardware|CPU|Memory|GPU Memory|Disk|Hourly Price|
| — | — | — | — | — | — |
|Nvidia T4 - small|4 vCPU|15 GB|16 GB|50 GB|$0.40|

If you want to save your data online, you can use Hugging Face’s private model repository or dataset repository, which should be enough to store up to about 100GB.