Can we use Inference Endpoints to make input a music file and output a music file?

Hello everyone, I’m exploring the use of Hugging Face’s Inference Endpoints for a project involving music files. My goal is to input a music file and have the model generate a music file as output. I understand that the feasibility of this would depend on the specific model being used and whether it has been trained to process and generate music files.

However, I’m aware that handling music files as direct output might not be possible at the moment. As an alternative, I’m considering having the output music file stored in an AWS S3 bucket or Google Cloud Storage after inference. Has anyone implemented something similar, or could provide guidance on how to achieve this? Are there any specific models (as example) you would recommend for this task?

Thank you in advance for your help.

You can also encode your audio file with base64 and return it as string, but uploading to s3 and then returning the URL makes more sense especially for larger files.

1 Like

Thank you for your response, @philschmid

I agree that uploading the output music file to a cloud storage service like Amazon S3 or Google Cloud Storage and then returning the URL seems to be a more efficient and practical solution, especially for larger files.
This approach would also provide a more seamless experience for users, as they can directly download the file from the provided URL.

Thanks again for your help!

I was looking for documentation on saving to an arbitrary storage (assuming access permissions are controlled by environment variables).

I would like to set environment variables for each endpoint, but I can’t seem to find it in the documentation. Maybe it can’t be done…?

have you succeeded in your task
is your model on rvc architecture by any chance?

have you succeeded
was your model RVC?