SpecAugment on Wav2Vec2 feature encoder outputs

Is SpecAugment able to use data in the form of feature vectors that are the output of the feature encoder in the Wav2Vec2 model? As in the HuggingFace documentation, SpecAugment will be used at the output of the feature encoder: Wav2Vec2

Furthermore, the official paper of SpecAugment does not explain that they can be used in the form of feature vectors, but rather they work in the form of spectrograms: [1904.08779] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

can anyone explain to me how this can happen, or can you give me a paper or journal that explains this?