Wav2Vec2 pretraining feature extraction during preprocessing as welll as training

Ujan · September 30, 2022, 8:02am

I’m looking at the wav2vec2 pretraining example given in the transformer repository. This is a preprocessing step :

with accelerator.main_process_first():
        vectorized_datasets = raw_datasets.map(
            prepare_dataset,
            num_proc=args.preprocessing_num_workers,
            remove_columns=raw_datasets["train"].column_names,
            cache_file_names=cache_file_names,
        )

The prepare_dataset function is this :

def prepare_dataset(batch):
        sample = batch[args.audio_column_name]

        inputs = feature_extractor(
            sample["array"], sampling_rate=sample["sampling_rate"], max_length=max_length, truncation=True
        )
        batch["input_values"] = inputs.input_values[0]
        batch["input_length"] = len(inputs.input_values[0])

        return batch

Where feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(args.model_name_or_path)

Wav2Vec2FeatureExtractor uses 1-D conv to extract features as is done in wav2vec2. But why is this being done as a preprocessing step? Furthermore during training the Wav2Vec2Model has yet another feature extractor which seemingly does the same thing but on the already extracted features?

class Wav2Vec2Model(Wav2Vec2PreTrainedModel):
    def __init__(self, config: Wav2Vec2Config):
        super().__init__(config)
        self.config = config
        self.feature_extractor = Wav2Vec2FeatureEncoder(config)
...
...
...
extract_features = self.feature_extractor(input_values)

Should’nt there be a single feature extraction step with 1-D conv done only during training? What am I missing?

Edit : Wav2Vec2FeatureExtractor simply uses Wav2Vec2FeatureEncoder. So they are identical.

Ujan · October 1, 2022, 2:59pm

I was looking at a deprecated feature extractor class with the same name. This is the feature extractor class being used : transformers/feature_extraction_wav2vec2.py at main · huggingface/transformers · GitHub

Topic		Replies	Views
Understanding Wav2vec2Processor Beginners	0	330	December 14, 2021
Different versions of 'wav2vec2' model and their differences Beginners	1	1512	August 7, 2021
Wer is 1 when using wav2vec2pretrained model Models	0	226	October 4, 2023
What does Wav2Vec2Tokenizer do?and what is the difference between it and Wav2Vec2FeatureExtractor? Beginners	0	298	May 12, 2023
Collapsing Wav2Vec2 pretraining loss Beginners	2	759	April 3, 2023

Wav2Vec2 pretraining feature extraction during preprocessing as welll as training

Related topics