Cannot train Wav2Vec2 processor with Wav2Vec2 or HuBERT

Hello everyone, I tried to build a processor for Wav2Vec2 and HuBERT following up this blog post, but my WER was ~ 0.99 all the time. Does anyone know how to deal with it?

Here are some config file I have

# preprocessor_config.json
{
  "do_normalize": true,
  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
  "feature_size": 1,
  "padding_side": "right",
  "padding_value": 0.0,
  "processor_class": "Wav2Vec2Processor",
  "return_attention_mask": true,
  "sampling_rate": 16000
}
# special_tokens_map.json
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}
# tokenizer_config.json
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "<pad>", "do_lower_case": false, "word_delimiter_token": "|", "replace_word_delimiter_char": " ", "tokenizer_class": "Wav2Vec2CTCTokenizer", "processor_class": "Wav2Vec2Processor"}
# vocab.json
{"n": 0, "v": 1, "q": 2, "'": 3, "t": 4, "y": 5, "c": 6, "d": 7, "x": 8, "e": 10, "f": 11, "o": 12, "u": 13, "g": 14, "h": 15, "m": 16, "s": 17, "i": 18, "z": 19, "r": 20, "w": 21, "a": 22, "l": 23, "j": 24, "b": 25, "p": 26, "k": 27, "|": 9, "<unk>": 28, "<pad>": 29}

Here is the training argument I used for fine tuning

batch = 4
epoch = 8
# load processor with all config files
processor = Wav2Vec2Processor.from_pretrained("../processor")
print(f"------------ batch {batch}, epoch {epoch} ----------------")
model = AutoModelForCTC.from_pretrained(
    "facebook/wav2vec2-base", 
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id)
model.freeze_feature_extractor()
training_args = TrainingArguments(
    group_by_length=True,
    per_device_train_batch_size=batch,
    evaluation_strategy="steps",
    num_train_epochs=epoch,
    fp16=True,
    gradient_checkpointing=True,
    save_steps=100,
    eval_steps=100,
    logging_steps=100,
    learning_rate=1e-4,
    weight_decay=0.005,
    warmup_steps=50,
    save_total_limit=2,
    logging_dir='../logs',
    data_seed=42,
    metric_for_best_model="wer",
    greater_is_better=False,
    seed=42,
    load_best_model_at_end=True)
trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=wls_dt["train"],
    eval_dataset=wls_dt["test"],
    tokenizer=processor.feature_extractor)
trainer.train()

data_collator and compute_metrics are exact the same from the blog post. Thanks!