How to use an unsupported Beam Search decoder in ASR Pipeline?

jlonsako · August 4, 2023, 5:27pm

I am using Meta’s new MMS model alongside a language model developed by Meta to transcribe some long form Amharic audio. As you can see in the code from this space, a beam-search-decoder is built from ‘torchaudio.models.ctc_decoder’. Because I want to utilize chunking and striding found in the ASR pipeline from the Transformers library, I have been trying to use this same decoder via the ASR pipeline by passing it in as an attribute. For reference, this is the code used to build the decoder:

from torchaudio.models.decoder import ctc_decoder
import json
from huggingface_hub import hf_hub_download

lm_decoding_config = {}
lm_decoding_configfile = hf_hub_download(
    repo_id="facebook/mms-cclms",
    filename="decoding_config.json",
    subfolder="mms-1b-all",
)

with open(lm_decoding_configfile) as f:
    lm_decoding_config = json.loads(f.read())

# allow language model decoding for "eng"

decoding_config = lm_decoding_config["eng"]

lm_file = hf_hub_download(
    repo_id="facebook/mms-cclms",
    filename=decoding_config["lmfile"].rsplit("/", 1)[1],
    subfolder=decoding_config["lmfile"].rsplit("/", 1)[0],
)
token_file = hf_hub_download(
    repo_id="facebook/mms-cclms",
    filename=decoding_config["tokensfile"].rsplit("/", 1)[1],
    subfolder=decoding_config["tokensfile"].rsplit("/", 1)[0],
)
lexicon_file = None
if decoding_config["lexiconfile"] is not None:
    lexicon_file = hf_hub_download(
        repo_id="facebook/mms-cclms",
        filename=decoding_config["lexiconfile"].rsplit("/", 1)[1],
        subfolder=decoding_config["lexiconfile"].rsplit("/", 1)[0],
    )

beam_search_decoder = ctc_decoder(
    lexicon=lexicon_file,
    tokens=token_file,
    lm=lm_file,
    nbest=1,
    beam_size=500,
    beam_size_token=50,
    lm_weight=float(decoding_config["lmweight"]),
    word_score=float(decoding_config["wordscore"]),
    sil_score=float(decoding_config["silweight"]),
    blank_token="<s>",
)

Because this decoder is not of type ‘BeamSearchDecoderCTC’ from ‘pyctcdecode’, it is not supported by the ASR pipeline. My question is this: what is the wisest way to make use of the language models developed by Meta in this pipeline? Should I try to build a Wav2Vec2ProcessorWithLM using the 5gram.bin file provided for the CC LM, or would it be prudent to add support for torchaudio.models.ctc_decoder? Or is there another wise option to go after? For now I have jerryrigged the ASR pipeline to use ctc_decoder instead of BeamSearchDecoderCTC, but that doesn’t seem like a wise long-term solution.

Topic		Replies	Views
Size mismatch for lm_head.weight/bias when loading state_dic for Wav2Vec2ForCTC on MMS french pipeline Beginners	0	261	June 8, 2023
MMS model on arabic audio 🤗Transformers	0	232	July 10, 2023
How to decode wav2vec2 output with beam search? Beginners	0	560	March 6, 2023
Pipeline and Hosted Inference API unable to load private LM-boosted ASR model 🤗Hub	0	1024	September 17, 2022
Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes 🤗Transformers	22	49029	December 19, 2024

How to use an unsupported Beam Search decoder in ASR Pipeline?

Related topics