Trying to build scream detection using pretrained model

chakradarraju · August 24, 2024, 3:50am

Context: There was a horrible rape case in Kolkata.

I have been wondering why does “smart” phone need elaborate manual steps to trigger SOS, when it already has enough inputs to detect panic like mic, camera, gps, gyroscope, etc.

I found this model (padmalcom/wav2vec2-large-nonverbalvocalization-classification) that promise to detecting scream. When I ran it on a test screaming audio, I’m getting different result for every run.

Here is the script I’m using:

from transformers import Wav2Vec2ForSequenceClassification

model_name = "padmalcom/wav2vec2-large-nonverbalvocalization-classification"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)

import librosa

audio_path = "scream_test.wav"
audio, sample_rate = librosa.load(audio_path, sr=48000)

from scipy.stats import zscore

audio = zscore(audio)

import torch

torch.manual_seed(42)

inputs = torch.tensor(audio).unsqueeze(0)

outputs = model(inputs)
predicted_class_index = torch.argmax(outputs.logits, dim=1).item()
labels = model.config.id2label

print(labels[predicted_class_index])

I also see this warning before output:

Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at padmalcom/wav2vec2-large-nonverbalvocalization-classification and are newly initialized: ['classifier.bias', 'classifier.weight', 'projector.bias', 'projector.weight', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I’m new here, what am I doing wrong?

chakradarraju · August 31, 2024, 9:35am

from transformers import pipeline

model_name = 'padmalcom/wav2vec2-large-nonverbalvocalization-classification'
classifier = pipeline('audio-classification', model=model_name)

print(classifier("scream_test.wav"))

Above is a simple way to write it, it is still failing with same error.

Topic		Replies	Views
[STT] Using huggingface pretrained models but different results =>Wav2Vec2 vs PatrickDemo 🤗Transformers	0	445	December 27, 2021
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized Beginners	1	1783	April 14, 2023
Phoneme Recognition Model 🤗Transformers	1	387	September 25, 2021
Speech language detection using Wave2vec 2.0 🤗Transformers	3	1471	March 24, 2021
Wav2vec2-xls-r-2b-22-to-16 sample code not running Models	1	702	March 18, 2022

Trying to build scream detection using pretrained model

Related topics