Adding Audio-MAE to Transformers

Hello @Ahmed-Telili !

Thank you for sharing your incredible work and for bringing this to the community’s attention! Training an Audio-MAE model on 30 million underwater sound samples is an impressive achievement, and your suggestion to add the model to the Hugging Face Transformers library is both valuable and exciting.

Here’s how you could move forward:

  1. Open a Feature Request:

    • You can create a feature request on the Hugging Face GitHub repository (Transformers Issues).
    • In the request, include details about Audio-MAE, your training setup, dataset (Orcasound), and potential applications, emphasizing its value for underwater acoustics and other domains.
  2. Prepare a Pretrained Model for Sharing:

    • If you’re comfortable, consider uploading your pretrained model to the Hugging Face Model Hub. This would make it accessible to the community and encourage adoption.
    • Use a descriptive README for your model card, including details like:
      • Training dataset and methodology.
      • Potential use cases (e.g., marine research, underwater sound monitoring).
      • Limitations or biases in the dataset.
  3. Contribute an Implementation:

    • If you’re open to contributing code, you can fork the Transformers repository and create an implementation for Audio-MAE, taking inspiration from the existing Vision Transformer MAE implementation.
    • Add relevant documentation and tests to make the integration seamless.

This initiative could significantly benefit the audio research community, especially in niche domains like underwater acoustics. Kudos to you for leading the way, and I’m excited to see where this goes!

Best regards,
Alan.

2 Likes