TL;DR: How can I load my private model from within handler.py for custom inference endpoint?
Longer version:
My ultimate goal is to run a private fine tuned whisper model that forces a language and transcribe token. I fine tuned the model for a specific language but the model does not reliably identify that language and frequently transcribes to very bad english.
First I tried the default ASR inference endpoint setup, but it seemed there was no way to force tokens (It seems there’s no parameters that can be passed [post)].
So in order to force the tokens, I’m now using a custom handler.py. Because it’s a private model, I couldn’t download it by repo name because I don’t have a way of accessing an HF token in the handler.py.
I saw found this code which seems to suggest there’s a way of locally accessing the model using “./” which seems to make sense
But when I try it, I get variations on this error:
OSERROR: ./ does not appear to have a file named config.json. Checkout 'https://huggingface.co/./main' for available files.
Even though my repo does have a config.json file at the root.
When I sneak an os.listdir('.')
into the script, I don’t see any files or directories.
Here’s my handler.py code -
from typing import Dict
from transformers.pipelines.audio_utils import ffmpeg_read
import torch
# from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from transformers import WhisperTokenizer
from transformers import WhisperProcessor
from transformers import WhisperForConditionalGeneration
import os
SAMPLE_RATE = 16000
class EndpointHandler:
def __init__(self, path=""):
self.processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3", language='french', task='transcribe')
self.model = WhisperForConditionalGeneration.from_pretrained("./.") # THIS IS WHERE THE ERROR OCCURS
self.model.config.forced_decoder_ids = self.processor.get_decoder_prompt_ids(
language="french", task="transcribe"
)
def __call__(self, data: Dict[str, bytes]) -> Dict[str, str]:
"""
Args:
data (:obj:):
includes the deserialized audio file as bytes
Return:
A :obj:`dict`:. base64 encoded image
"""
# process input
inputs = data.pop("inputs", data)
audio_nparray = ffmpeg_read(inputs, SAMPLE_RATE)
# audio_tensor = torch.from_numpy(audio_nparray)
# run inference pipeline
result = self.model.transcribe(audio_nparray)
# postprocess the prediction
return {"text": result["text"]}