Hello everyone, I tried to build a processor for Wav2Vec2 and HuBERT following up this blog post, but my WER was ~ 0.99 all the time. Does anyone know how to deal with it?
Here are some config file I have
# preprocessor_config.json
{
"do_normalize": true,
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
"feature_size": 1,
"padding_side": "right",
"padding_value": 0.0,
"processor_class": "Wav2Vec2Processor",
"return_attention_mask": true,
"sampling_rate": 16000
}
# special_tokens_map.json
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}
# tokenizer_config.json
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "<pad>", "do_lower_case": false, "word_delimiter_token": "|", "replace_word_delimiter_char": " ", "tokenizer_class": "Wav2Vec2CTCTokenizer", "processor_class": "Wav2Vec2Processor"}
# vocab.json
{"n": 0, "v": 1, "q": 2, "'": 3, "t": 4, "y": 5, "c": 6, "d": 7, "x": 8, "e": 10, "f": 11, "o": 12, "u": 13, "g": 14, "h": 15, "m": 16, "s": 17, "i": 18, "z": 19, "r": 20, "w": 21, "a": 22, "l": 23, "j": 24, "b": 25, "p": 26, "k": 27, "|": 9, "<unk>": 28, "<pad>": 29}
Here is the training argument I used for fine tuning
batch = 4
epoch = 8
# load processor with all config files
processor = Wav2Vec2Processor.from_pretrained("../processor")
print(f"------------ batch {batch}, epoch {epoch} ----------------")
model = AutoModelForCTC.from_pretrained(
"facebook/wav2vec2-base",
ctc_loss_reduction="mean",
pad_token_id=processor.tokenizer.pad_token_id)
model.freeze_feature_extractor()
training_args = TrainingArguments(
group_by_length=True,
per_device_train_batch_size=batch,
evaluation_strategy="steps",
num_train_epochs=epoch,
fp16=True,
gradient_checkpointing=True,
save_steps=100,
eval_steps=100,
logging_steps=100,
learning_rate=1e-4,
weight_decay=0.005,
warmup_steps=50,
save_total_limit=2,
logging_dir='../logs',
data_seed=42,
metric_for_best_model="wer",
greater_is_better=False,
seed=42,
load_best_model_at_end=True)
trainer = Trainer(
model=model,
data_collator=data_collator,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=wls_dt["train"],
eval_dataset=wls_dt["test"],
tokenizer=processor.feature_extractor)
trainer.train()
data_collator
and compute_metrics
are exact the same from the blog post. Thanks!