Wav2Vec2ProcessorWithLM intended usage

drenshaw · August 23, 2022, 8:11am

@patrickvonplaten has written an excellent blog post on Boosting Wav2Vec2 with n-grams in Transformers but this doesn’t demonstrate how to use Wav2Vec2ProcessorWithLM dynamically with a pre-trained wav2vec2 model, i.e. without creating a new version of the model that has the LM statically included, via the Pipeline API.

This was addressed, again by @patrickvonplaten, in an issue in the Transformers GitHub: How to use Wav2Vec2ProcessorWithLM in pipeline? #16759. However, following this example doesn’t seem to work with a different pre-trained model, e.g.

processor = transformers.Wav2Vec2Processor.from_pretrained(
    "facebook/wav2vec2-base-960h"
)
decoder = pyctcdecode.build_ctcdecoder(
    labels=list(processor.tokenizer.get_vocab().keys()),
    kenlm_model_path=KENLM_MODEL_PATH,
    unigrams=get_unigrams(),
)
processor_with_lm = transformers.Wav2Vec2ProcessorWithLM(
    feature_extractor=processor.feature_extractor,
    tokenizer=processor.tokenizer,
    decoder=decoder,
)
pipeline = transformers.pipeline(
    model="facebook/wav2vec2-base-960h",
    tokenizer=processor_with_lm,
    feature_extractor=processor.feature_extractor,
    decoder=decoder,
)
print(pipeline(WAV_FILE_PATH))

which fails with

Traceback (most recent call last):
  File "/media/vm/repos+venv/repos/qspeech/projects/new_approach/hugging_face/error_example.py", line 45, in <module>
    result = pipeline(WAV_FILE_PATH)
  File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 182, in __call__
    return super().__call__(inputs, **kwargs)
  File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1067, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1091, in run_single
    outputs = self.postprocess(all_outputs, **postprocess_params)
  File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 377, in postprocess
    text = self.tokenizer.decode(items, skip_special_tokens=skip_special_tokens)
TypeError: decode() got an unexpected keyword argument 'skip_special_tokens'

But, if I add the following line immediately before creating the pipeline

processor.feature_extractor._processor_class = "Wav2Vec2ProcessorWithLM"

then it works correctly.

I’ve not seen any examples where the _processor_class attribute of the feature_extractor needs to be set like this so I’m clearly doing something wrong. What is the intended approach to doing this that avoids the need to set a private attribute?

Topic		Replies	Views
Why I'm getting same result with or without using Wav2Vec2Processor? 🤗Tokenizers	0	327	February 25, 2023
How to create Wav2Vec2 With Language model 🤗Transformers	15	5979	May 5, 2023
How do I instantiate a Wav2Vec2Processor with a phoneme tokenizer? 🤗Datasets	1	35	February 27, 2025
Eliminating PAD token from wav2vec2 prediction 🤗Transformers	2	941	June 3, 2022
Use of kenlm with Whisper? Beginners	2	386	February 18, 2024

Wav2Vec2ProcessorWithLM intended usage

Related topics