Adding Audio-MAE to Transformers

Hello Hugging Face Community,

I have trained the Audio-MAE model on 30M underwater sound samples from the Orcasound dataset using 8 H100 GPUs for over a month. While exploring the Hugging Face Transformers library, I noticed that there is an implementation for Vision Transformer (ViT) MAE, but no implementation for Audio-MAE.

Given the growing interest in underwater audio-based models, I wanted to ask if it would be possible to add the Audio-MAE model to the Transformers library, pretrained on 30 million samples. I believe it could be valuable for various audio-related applications, especially in underwater acoustics.

Thank you in advance for your feedback!

Best regards,

1 Like

Hello @Ahmed-Telili !

Thank you for sharing your incredible work and for bringing this to the community’s attention! Training an Audio-MAE model on 30 million underwater sound samples is an impressive achievement, and your suggestion to add the model to the Hugging Face Transformers library is both valuable and exciting.

Here’s how you could move forward:

  1. Open a Feature Request:

    • You can create a feature request on the Hugging Face GitHub repository (Transformers Issues).
    • In the request, include details about Audio-MAE, your training setup, dataset (Orcasound), and potential applications, emphasizing its value for underwater acoustics and other domains.
  2. Prepare a Pretrained Model for Sharing:

    • If you’re comfortable, consider uploading your pretrained model to the Hugging Face Model Hub. This would make it accessible to the community and encourage adoption.
    • Use a descriptive README for your model card, including details like:
      • Training dataset and methodology.
      • Potential use cases (e.g., marine research, underwater sound monitoring).
      • Limitations or biases in the dataset.
  3. Contribute an Implementation:

    • If you’re open to contributing code, you can fork the Transformers repository and create an implementation for Audio-MAE, taking inspiration from the existing Vision Transformer MAE implementation.
    • Add relevant documentation and tests to make the integration seamless.

This initiative could significantly benefit the audio research community, especially in niche domains like underwater acoustics. Kudos to you for leading the way, and I’m excited to see where this goes!

Best regards,
Alan.

2 Likes