Model broken on Hub: wav2vec robust

The example “use this in Transformers” on facebook/wav2vec2-large-robust · Hugging Face fails with an OSError, seemingly due to a problem with the uploaded model on the Hub.

Notebook to replicate error: Google Colab

It looks like there is a problem specifically with this model on the hub, because other models work with the same code snippet e.g. if you swap out “wav2vec2-large-robust” for “facebook/wav2vec2-large-960h-lv60-self”, that one works fine.

See also Can't load tokenizer for 'facebook/wav2vec2-large-robust'

Here’s the state of the cache after I try instantiating both tokenizers, the robust one and the 960h one.

And here’s what happens when I do a search in that directory using ag

~/.cache/huggingface/transformers# ll
total 48
drwxr-xr-x 2 root root 4096 Oct 19 18:32 ./
drwxr-xr-x 3 root root 4096 Oct 19 18:21 ../
-rw-r--r-- 1 root root 1606 Oct 19 18:32 5681e9346f90f9fc4d72503284e96e6bbdc8bf5a38cafeb6ebf3791120b7570d.e0d02e2ed52b244ae1896cccc2beab5caccc2478b8b3d1131c14666c6e14cfdc
-rw-r--r-- 1 root root  153 Oct 19 18:32 5681e9346f90f9fc4d72503284e96e6bbdc8bf5a38cafeb6ebf3791120b7570d.e0d02e2ed52b244ae1896cccc2beab5caccc2478b8b3d1131c14666c6e14cfdc.json
-rwxr-xr-x 1 root root    0 Oct 19 18:32 5681e9346f90f9fc4d72503284e96e6bbdc8bf5a38cafeb6ebf3791120b7570d.e0d02e2ed52b244ae1896cccc2beab5caccc2478b8b3d1131c14666c6e14cfdc.lock*
-rw-r--r-- 1 root root  162 Oct 19 18:32 814e23f251e4a5cd4763cf9b9b6ecb43e43f6a219ec036d9db3419f8dc9d93c3.6685801c836773b383173a1d86dd10317cc4f4eeadcf01f689918a50fdda946b
-rw-r--r-- 1 root root  163 Oct 19 18:32 814e23f251e4a5cd4763cf9b9b6ecb43e43f6a219ec036d9db3419f8dc9d93c3.6685801c836773b383173a1d86dd10317cc4f4eeadcf01f689918a50fdda946b.json
-rwxr-xr-x 1 root root    0 Oct 19 18:31 814e23f251e4a5cd4763cf9b9b6ecb43e43f6a219ec036d9db3419f8dc9d93c3.6685801c836773b383173a1d86dd10317cc4f4eeadcf01f689918a50fdda946b.lock*
-rw-r--r-- 1 root root   85 Oct 19 18:32 de1143309c04207e22168c4563b24770c49eb4e933dbad506eadae8e43a7b422.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd
-rw-r--r-- 1 root root  165 Oct 19 18:32 de1143309c04207e22168c4563b24770c49eb4e933dbad506eadae8e43a7b422.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd.json
-rwxr-xr-x 1 root root    0 Oct 19 18:32 de1143309c04207e22168c4563b24770c49eb4e933dbad506eadae8e43a7b422.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd.lock*
-rw-r--r-- 1 root root  291 Oct 19 18:32 e1f77599caea3f1f7004987f2f7a354d0fd31966b1b6bca5db52b63a8a8cb995.7c838a0a103758bad6ef4922531682da23a8b1c45d25f8d8e7a6d857c0b26544
-rw-r--r-- 1 root root  152 Oct 19 18:32 e1f77599caea3f1f7004987f2f7a354d0fd31966b1b6bca5db52b63a8a8cb995.7c838a0a103758bad6ef4922531682da23a8b1c45d25f8d8e7a6d857c0b26544.json
-rwxr-xr-x 1 root root    0 Oct 19 18:32 e1f77599caea3f1f7004987f2f7a354d0fd31966b1b6bca5db52b63a8a8cb995.7c838a0a103758bad6ef4922531682da23a8b1c45d25f8d8e7a6d857c0b26544.lock*
-rw-r--r-- 1 root root 1583 Oct 19 18:21 f4ed1cd2d2b55e3401644b177d5a166863754c345f98ed09260d0dce9a385d9a.2523c04309986c65617e9a8f2f66c3d656ba969fe07a994af31a3a0cf7b19b78
-rw-r--r-- 1 root root  145 Oct 19 18:21 f4ed1cd2d2b55e3401644b177d5a166863754c345f98ed09260d0dce9a385d9a.2523c04309986c65617e9a8f2f66c3d656ba969fe07a994af31a3a0cf7b19b78.json
-rwxr-xr-x 1 root root    0 Oct 19 18:21 f4ed1cd2d2b55e3401644b177d5a166863754c345f98ed09260d0dce9a385d9a.2523c04309986c65617e9a8f2f66c3d656ba969fe07a994af31a3a0cf7b19b78.lock*
~/.cache/huggingface/transformers# ag 960h
e1f77599caea3f1f7004987f2f7a354d0fd31966b1b6bca5db52b63a8a8cb995.7c838a0a103758bad6ef4922531682da23a8b1c45d25f8d8e7a6d857c0b26544.json
1:{"url": "https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self/resolve/main/vocab.json", "etag": "\"88181b954aa14df68be9b444b3c36585f3078c0a\""}

5681e9346f90f9fc4d72503284e96e6bbdc8bf5a38cafeb6ebf3791120b7570d.e0d02e2ed52b244ae1896cccc2beab5caccc2478b8b3d1131c14666c6e14cfdc.json
1:{"url": "https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self/resolve/main/config.json", "etag": "\"674493ec11ad5d90eaf72d07f69a4bb60203f46b\""}

de1143309c04207e22168c4563b24770c49eb4e933dbad506eadae8e43a7b422.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd.json
1:{"url": "https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self/resolve/main/special_tokens_map.json", "etag": "\"25bc39604f72700b3b8e10bd69bb2f227157edd1\""}

814e23f251e4a5cd4763cf9b9b6ecb43e43f6a219ec036d9db3419f8dc9d93c3.6685801c836773b383173a1d86dd10317cc4f4eeadcf01f689918a50fdda946b.json
1:{"url": "https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self/resolve/main/tokenizer_config.json", "etag": "\"97d4216be71590fae568725f363d52f00eb7c944\""}

5681e9346f90f9fc4d72503284e96e6bbdc8bf5a38cafeb6ebf3791120b7570d.e0d02e2ed52b244ae1896cccc2beab5caccc2478b8b3d1131c14666c6e14cfdc
2:  "_name_or_path": "facebook/wav2vec2-large-960h-lv60-self",
~/.cache/huggingface/transformers# ag robust
f4ed1cd2d2b55e3401644b177d5a166863754c345f98ed09260d0dce9a385d9a.2523c04309986c65617e9a8f2f66c3d656ba969fe07a994af31a3a0cf7b19b78.json
1:{"url": "https://huggingface.co/facebook/wav2vec2-large-robust/resolve/main/config.json", "etag": "\"a52cf9097910107f4e0d1bccf82fd4e08d4e4b66\""}

Checking the two models actual file list, it seems that
facebook/wav2vec2-large-robust at main simply lacks the necessary file.

For comparison, facebook/wav2vec2-large-960h-lv60-self at main has a tokenizer_config.json, and other files.

@patrickvonplaten where might we find an updated usage example for the wav2vec robust model that works despite not having the tokenizer_config.json and other files that exist within, say, the 960h model? And could I perhaps help by updating the example on the Hub somehow?

Edit: looked through the Example Notebook, but it seems to be for training, not inference? It seems to require creating a vocab.json first?

Hey @cdleong,

I should probably leave a better note on the model card.
This is a pretrained model only which means that it has never been trained on text data => therefore can’t have a tokenizer. To fine-tune the model on a downstream task you can follow this blog here: Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers

1 Like

I see, thanks for explaining! And how would one use it for inference, then? That is what I was originally attempting to do.

@patrickvonplaten the blog post is nice but is there something with tensorflow? somebody suggested this notebook GitHub - vasudevgupta7/gsoc-wav2vec2: GSoC'2021 | TensorFlow implementation of Wav2Vec2 but it is not an official notebook from huggingface. What do you think?

Thanks!