Hello everyone,
I am currently using the experimental-webgpu branch of whisper-web to run Hugging Face’s Whisper models.
My setup utilizes local models with the following environment configuration:
env.allowLocalModels = true;
env.localModelPath = "./models";
I am trying to load a small ONNX Whisper model with specific dtype settings for the encoder and decoder to optimize memory and performance. Here’s the pipeline code:
const transcriber = await pipeline(
"automatic-speech-recognition",
"my-whisper-model",
{
dtype: {
encoder_model: "fp32", // Full precision for stability
decoder_model_merged: "q4", // 4-bit quantization for memory savings
},
device: "webgpu",
}
);
However, the pipeline initialization fails with the following error:
Uncaught (in promise) Error: Can't create a session. ERROR_CODE: 7, ERROR_MESSAGE: Failed to load model because protobuf parsing failed.
-
Precision Levels: Are there recommended or supported dtype precision levels for Whisper models in WebGPU? Is it always necessary to use fp32 for the encoder, or can other settings work reliably?
-
File Naming Conventions: Do ONNX file names or structures need to follow specific conventions for the dtype keys (e.g., encoder_model, decoder_model_merged) to work correctly?
-
General Guidance: Are there any guidelines or best practices for configuring dtype with local models in this library, especially with the WebGPU backend?
Thanks in advance!