Issues with Configuring dtype for Local Models in Whisper-Web (Experimental WebGPU)

Hello everyone,

I am currently using the experimental-webgpu branch of whisper-web to run Hugging Face’s Whisper models.
My setup utilizes local models with the following environment configuration:

env.allowLocalModels = true;
env.localModelPath = "./models";

I am trying to load a small ONNX Whisper model with specific dtype settings for the encoder and decoder to optimize memory and performance. Here’s the pipeline code:

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "my-whisper-model",
  {
    dtype: {
      encoder_model: "fp32", // Full precision for stability
      decoder_model_merged: "q4", // 4-bit quantization for memory savings
    },
    device: "webgpu",
  }
);

However, the pipeline initialization fails with the following error:

Uncaught (in promise) Error: Can't create a session. ERROR_CODE: 7, ERROR_MESSAGE: Failed to load model because protobuf parsing failed.

  1. Precision Levels: Are there recommended or supported dtype precision levels for Whisper models in WebGPU? Is it always necessary to use fp32 for the encoder, or can other settings work reliably?

  2. File Naming Conventions: Do ONNX file names or structures need to follow specific conventions for the dtype keys (e.g., encoder_model, decoder_model_merged) to work correctly?

  3. General Guidance: Are there any guidelines or best practices for configuring dtype with local models in this library, especially with the WebGPU backend?

Thanks in advance!

1 Like