"Seeking Advice: Storing High-Volume Sign Language Dataset for ChatDEAF"

yasodeafs · April 20, 2025, 7:01am

Hello Hugging Face Community,

I’m currently working on a project called ChatDEAF, an AI-based system designed to understand and support sign language communication — especially for deaf mothers and hearing children.

One of our major challenges is storing massive sign language data, where each “word” includes multiple data points:

handshape

position

facial expression

movement

speed & rhythm

This makes sign language datasets 5–10x larger than normal text-based ones.

We’re concerned that normal SSD or cloud storage may be insufficient in the long run.

What would you recommend for:

Efficient video + gesture data storage

Long-term backup

Integration with AI model training

Should we explore AWS S3, Azure Blob, Backblaze, or custom solutions?

We’d love to hear your thoughts — especially from those experienced with video + motion dataset storage.

Thank you for your support!

— Yasin Şimşek
ChatDEAF Founder & Deaf Data Architect

Topic		Replies	Views
ChatDEAF Project – First Open ISL/TİD Dataset for Sign Language Accessibility Awesome paper	2	83	April 20, 2025
ChatDEAF – AI-powered Sign Language and Captioning for the Deaf Community Spaces	0	19	April 30, 2025
The ChatDEAF Project Has Officially Launched! Awesome paper	4	90	April 24, 2025
ChatDEAF – Intuitive Understanding of Silent Sentences Awesome paper	0	13	April 20, 2025
Request for Additional Storage Space for Dataset Repository 🤗Datasets	3	99	October 11, 2024

"Seeking Advice: Storing High-Volume Sign Language Dataset for ChatDEAF"

Related topics