Using a fine tuned whisper sherpa onnx model to create a android app with flutter

Mcjules · January 6, 2025, 10:58am

Hi everyone,

I’m currently working on implementing a fine-tuned Whisper Small model wrapped with Sherpa ONNX in an Android app using Flutter. The goal is to achieve real-time speech-to-text functionality. I’ve been following the documentation provided here: Sherpa ONNX Flutter Examples.

My fine-tuned model is designed for the German language and includes a specific vocabulary. However, I’ve encountered an issue: the documentation mentions the need for a joiner file, but my research indicates that the Whisper model already includes its joiner within the tensor architecture.

I’m looking for any workarounds or additional documentation that could help me integrate the Sherpa ONNX Whisper model into my app without the need for a separate joiner file.

Thank you in advance for your help!

Best,
Jules

Alanturner2 · January 6, 2025, 1:13pm

Hi Jules,

It’s fantastic that you’re working on integrating a fine-tuned Whisper model with Sherpa ONNX in a Flutter-based Android app! Achieving real-time speech-to-text in a specific language is a challenging yet rewarding task.

In my opinion there are several ways to solve the problem. And following two methods are good for you.

Sherpa ONNX Expectations:
Sherpa ONNX often assumes models follow a certain structure, especially for RNN-T-based architectures, which include explicit joiner components. However, Whisper’s architecture (transformer-based) doesn’t require an external joiner since it incorporates all relevant processing within its tensor architecture.
Workaround Ideas:
- Customize Sherpa ONNX:
  You may need to bypass or adapt the Sherpa ONNX codebase to account for Whisper’s unique architecture. For instance, investigate how Sherpa ONNX expects the joiner to be used and modify those parts to align with Whisper’s outputs.
- Simplify Output Matching:
  If the joiner logic is primarily for aligning model outputs with a vocabulary, you might manually map Whisper’s decoded outputs to your specific vocabulary.

Feel free to share more details about the integration process, and I’d be happy to brainstorm further! Best of luck with your app development — it sounds like an amazing project!

csukuangfj · January 8, 2025, 1:47am

I am one of the authors of sherpa-onnx.

Please use

It is a pure dart example. However, it contains everything that you need to make a Flutter APP based on it.

Mcjules · January 8, 2025, 9:11pm

Hi there,

Thank you for your input! I really appreciate it. The documentation provided by csukuangfj is already excellent. However, I was wondering if you also have a solution for streaming?

Best regards,

Jules

csukuangfj · January 10, 2025, 3:04am

We support two-pass ASR, where in the first pass we use a small, fast, but less accurate streaming model and in the second pass we use a non-streaming but more accurate model.

You can find pre-built two-pass ASR APKs with Whisper inside at
https://k2-fsa.github.io/sherpa/onnx/android/apk-2pass.html

(Search for whisper or moonshine in the above link)

small_zipformer_moonshine_tiny_int8.apk

You can also implement two-pass ASR in your Flutter app. The Dart API from sherpa-onnx has everything you need to implement that.

Ademie · January 10, 2025, 5:33pm

The problem we’re currently experiencing is not related to pre-built models. We have an ai engineer create a custom streaming transcription multi-language model that supports german and English.

The problem is this, I am currently making use of sherpa_onnx package in the flutter app which is the only reasonable pub dev package that supports kaldi, whisper or sherpa models.

To use the models, the package requires we include the encoder, decoder, and joiner. However, the model the ai engineer created has the joiner embedded into it and not extracted.

What we’re concerned about now is how to consume this model without having the app continuously crashing.

Checked the attached image to see how other sherpa models are meant to be used by default because the joiners are separated by default.

csukuangfj · January 13, 2025, 3:15am

To use the models, the package requires we include the encoder, decoder, and joiner.

That is not true. You only need to pass a joiner if you are using a transducer model.

I hope you know what a transducer model is.

As we can see from the screenshot you posted, at line 6, it returns
Future<sherpa_onnx.OnlineModelConfig>

The definition of sherpa_onnx.OnlineModelConfig can be found at

github.com

k2-fsa/sherpa-onnx/blob/master/flutter/sherpa_onnx/lib/src/online_recognizer.dart#L52


      
            const OnlineZipformer2CtcModelConfig({this.model = ''});
          
            @override
            String toString() {
              return 'OnlineZipformer2CtcModelConfig(model: $model)';
            }
          
            final String model;
          }
          
          class OnlineModelConfig {
            const OnlineModelConfig({
              this.transducer = const OnlineTransducerModelConfig(),
              this.paraformer = const OnlineParaformerModelConfig(),
              this.zipformer2Ctc = const OnlineZipformer2CtcModelConfig(),
              required this.tokens,
              this.numThreads = 1,
              this.provider = 'cpu',
              this.debug = true,
              this.modelType = '',
              this.modelingUnit = '',

If you use a streaming paraformer or a streaming zipformer-CTC model, then you don’t need a joiner at all.

We have an ai engineer create a custom streaming transcription multi-language model that supports german and English

Is your model a streaming whisper model?
If yes, then the current sherpa-onnx does not support it.

As you said, it is a customized streaming model; I think you cannot find anywhere to support your customized streaming model.

If your streaming model is open-sourced and if you provide an ONNX exported model, then we can support it in sherpa-onnx and also provide Dart/Flutter examples for it. Otherwise, you need to change sherpa-onnx to support it by yourself.

greyfaulkenberry · February 26, 2025, 5:02am

I’m not sure it would be helpful for your customized streaming model, but I’ve been trying out different models on flutter. And you can see the configs I setup here. If you find a way to get yours to work, I’d love to know how!

Topic		Replies	Views
How to run whisper as onnx? Beginners	1	104	May 30, 2025
Convert OpenAI whisper transformer model to Quantized tflite model 🤗Transformers	1	2374	November 7, 2023
Finetune whisper-tiny in german for tflite runtime Models	2	213	October 16, 2024
Convert ASR to ONNX Models	0	875	February 12, 2021
Audio classifier in TFLite format 🤗Optimum	0	598	January 25, 2024

Using a fine tuned whisper sherpa onnx model to create a android app with flutter

Related topics