Speech to text on constrained hardware (embedded)

Poulet67 · May 16, 2024, 6:06am

Good day to you all.

I’ve been interested in implementing a speech to text model on a low power micro controller with an embedded AI accelerator.

I started out by trying to replicate the JASPER model as published in their research paper here: https://arxiv.org/pdf/1904.03288

I knew the hardware was limited and now I realize just how severe the limitations are. These are all for 1-D Convolutional layer (conv1d):

Padding can not be more than 0 when using more than 64 input channels
Kernel size can only be 1-9
Padding can only be 1, 2 or 3
Stride can only be 1
Dilation can be 1 to 1023 for kernel lengths 1, 2, or 3 and is fixed to 1 for kernels with length greater than 3.

Needless to say JASPER might be a bit of a pipe dream. Is there another approach that might work with these limitations in mind? I don’t need a super cracked low WER model, but I don’t want it to be completely useless either.

Thanks

Topic		Replies	Views
Unable to find Speech2Text model Models	0	232	March 5, 2021
Speech to text model with tensorflow? 🤗Transformers	1	571	November 4, 2021
Convert ASR to ONNX Models	0	889	February 12, 2021
Facing difficulty while fine tuning speech recognition model in local pc Beginners	3	426	April 28, 2022
Wav2vec2 not converging when finetuning 🤗Transformers	7	2588	June 15, 2021

Speech to text on constrained hardware (embedded)

Related topics