I need to load ivrit.ai/whisper-large-v3 model on my machine using Whisper library. This model has the following files:
As you can see it uses .safetensors format which needs to be loaded using transformers library. I converted the model (both .safetensors files) and combined them using the script:
for filename in tqdm(glob(f"{base_path}/*.safetensors")):
ckpt = load_file(filename)
torch.save(ckpt, filename.replace(".safetensors", ".bin"))
And,
part1 = torch.load("model-00001-of-00002.bin")
part2 = torch.load("model-00002-of-00002.bin")
combined = {**part1, **part2} # Be cautious if keys overlap
torch.save(combined, "ivrit-ai-whisper-large-v3.pt")
I also set model dimensions using:
dims = ModelDimensions(
n_vocab=51865,
n_audio_ctx=1500,
n_audio_state=1280, # Use 1280 to match d_model embedding size
n_audio_head=8,
n_audio_layer=32,
n_text_ctx=192,
n_text_state=1280,
n_text_head=20,
n_text_layer=24,
n_mels=80
)
model = Whisper(dims)
checkpoint = {
"dims": vars(dims), # Convert ModelDimensions to dict
"model_state_dict": model.state_dict(),
"decoder_state": None,
"version": 2,
"init_args": {
"device": "cpu",
"n_vocab": dims.n_vocab,
"n_audio_ctx": dims.n_audio_ctx,
"n_audio_state": dims.n_audio_state,
"n_audio_head": dims.n_audio_head,
"n_audio_layer": dims.n_audio_layer,
"n_text_ctx": dims.n_text_ctx,
"n_text_state": dims.n_text_state,
"n_text_head": dims.n_text_head,
"n_text_layer": dims.n_text_layer,
"n_mels": dims.n_mels
}
}
torch.save(checkpoint, "ivrit-ai-whisper-large-v3.pt")
And using
model = whisper.load_model("ivrit-ai-whisper-large-v3.pt")
does work, but does NOT produce Hebrew words probably because I didn’t integrate the vocabulary and other files of the original HuggingFace repo.
How can I integrate whatever is needed to make the model work exactly as if I was loading it using transformers?