@patrickvonplaten has written an excellent blog post on Boosting Wav2Vec2 with n-grams in Transformers but this doesn’t demonstrate how to use
Wav2Vec2ProcessorWithLM
dynamically with a pre-trained wav2vec2 model, i.e. without creating a new version of the model that has the LM statically included, via the Pipeline
API.
This was addressed, again by @patrickvonplaten, in an issue in the Transformers GitHub: How to use Wav2Vec2ProcessorWithLM in pipeline? #16759. However, following this example doesn’t seem to work with a different pre-trained model, e.g.
processor = transformers.Wav2Vec2Processor.from_pretrained(
"facebook/wav2vec2-base-960h"
)
decoder = pyctcdecode.build_ctcdecoder(
labels=list(processor.tokenizer.get_vocab().keys()),
kenlm_model_path=KENLM_MODEL_PATH,
unigrams=get_unigrams(),
)
processor_with_lm = transformers.Wav2Vec2ProcessorWithLM(
feature_extractor=processor.feature_extractor,
tokenizer=processor.tokenizer,
decoder=decoder,
)
pipeline = transformers.pipeline(
model="facebook/wav2vec2-base-960h",
tokenizer=processor_with_lm,
feature_extractor=processor.feature_extractor,
decoder=decoder,
)
print(pipeline(WAV_FILE_PATH))
which fails with
Traceback (most recent call last):
File "/media/vm/repos+venv/repos/qspeech/projects/new_approach/hugging_face/error_example.py", line 45, in <module>
result = pipeline(WAV_FILE_PATH)
File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 182, in __call__
return super().__call__(inputs, **kwargs)
File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1067, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1091, in run_single
outputs = self.postprocess(all_outputs, **postprocess_params)
File "/home/ubuntu/.conda/envs/venv/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 377, in postprocess
text = self.tokenizer.decode(items, skip_special_tokens=skip_special_tokens)
TypeError: decode() got an unexpected keyword argument 'skip_special_tokens'
But, if I add the following line immediately before creating the pipeline
processor.feature_extractor._processor_class = "Wav2Vec2ProcessorWithLM"
then it works correctly.
I’ve not seen any examples where the _processor_class
attribute of the feature_extractor
needs to be set like this so I’m clearly doing something wrong. What is the intended approach to doing this that avoids the need to set a private attribute?