How do I instantiate a Wav2Vec2Processor with a phoneme tokenizer?

jacobs3330 · February 26, 2025, 9:57pm

Wav2Vec2Phoneme documentation is at Wav2Vec2Phoneme.
It says that output has to be decoded using Wav2Vec2PhonemeCTCTokenizer.

The documentation for huggingface models other=phoneme-recognition
includes a reference to facebook/wav2vec2-xlsr-53-espeak-cv-ft · Hugging Face

The source code there includes the line

processor = Wav2Vec2Processor.from_pretrained(checkpoint)

Execution explodes, complaining about a missing tokenizer. It is likely that the documentation
is incorrect.

I tried instantiating a Wav2Vec2PhonemeCTCTokenizer (using a vocab file in the huggingface cache).
If I’m right, the documentation will need to changed. The download will need to be changed
to provide the vocab_file, too (I fished the json out of the huggingface cache).

tokenizer = Wav2Vec2PhonemeCTCTokenizer(vocab_file=‘wav2vec2-lv-60-espeak-cv-ft-vocab.json’)
processor = Wav2Vec2Processor.from_pretrained(checkpoint, tokenizer)

There is a legal problem with my using this (a requirement for espeak which has a GPL license),
so I want to make sure that the above two lines are correct. Are they?

John6666 · February 27, 2025, 11:14am

It’s probably been like that for a few years…

Topic		Replies	Views
Wav2vec2CTCTokenizer and vocab.json 🤗Tokenizers	2	1108	October 29, 2022
How to convert wav2vec2 checkpoint to Huggingface processor and model? 🤗Transformers	1	573	July 25, 2021
Facebook/wav2vec2-large-xlsr-53 on the hub: tokenizer issue 🤗Hub	4	4027	March 18, 2022
Vocabulary count mismatch when loading the previously created tokenizer 🤗Transformers	0	168	January 8, 2024
Train phoneme recognizer using Wav2Vec2 intermediate features Beginners	0	500	November 1, 2022

How do I instantiate a Wav2Vec2Processor with a phoneme tokenizer?

Related topics