Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

John6666 · March 25, 2025, 2:37pm

As it is a very small model, it seems that as long as the hardware has around 2GB of VRAM, it should be fine. It should work with almost all GPU rental services…

The problem is how to generate the audio data (GUI? CLI? Self-made script?) and where to store the data for such a long time.

For example, if the generation method is fine with the Hugging Face GUI space and the storage destination is your hard disk, the following would be the cheapest option for a plan with no time limit.

GPU
|Hardware|CPU|Memory|GPU Memory|Disk|Hourly Price|
| — | — | — | — | — | — |
|Nvidia T4 - small|4 vCPU|15 GB|16 GB|50 GB|$0.40|

If you want to save your data online, you can use Hugging Face’s private model repository or dataset repository, which should be enough to store up to about 100GB.

Topic		Replies	Views
Compute VRAM size for Text2Text text generation 🤗Transformers	0	45	December 1, 2024
Russian ASR: Fine-tuning Wav2Vec2 Languages at Hugging Face	20	2699	May 22, 2021
Cuda out of memory issue training whisper model on single GPU Intermediate	0	908	December 15, 2023
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3769	September 18, 2023
How to predict the memory requirements for a given model? Models	0	744	June 9, 2022

Est Package for 140 Hours of Audio with F5 TTS – Need Advice for a Newbie

Related topics