Hi , i am trying to generate adversarial audio examples on huggingface model using ART GitHub - Trusted-AI/adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams. Specifically the class ImperceptiblePytorchASR algorithm based on carlini [1801.01944] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text.
I extended their estimator class to use huggingface model but can’t find a solution for the autograd variable being destroyed when i send my inputs for feature extraction to the wave2vec2 processor.
self._processor = Wav2Vec2Processor.from_pretrained(pretrained_model)
self.global_optimal_delta = Variable(
torch.zeros(self.batch_size, self.global_max_length).type(torch.cuda.FloatTensor),
requires_grad=True
)
local_delta = self.global_optimal_delta[:local_batch_size, :local_max_length]
local_delta_rescale = torch.clamp(local_delta, -self.eps, self.eps).to(self.estimator.device)
local_delta_rescale *= torch.tensor(rescale).to(self.estimator.device)
adv_input = local_delta_rescale + torch.tensor(original_input).to(self.estimator.device)
masked_adv_input = adv_input * torch.tensor(input_mask).to(self.estimator.device)
# i cant find any other solution for getting the processor to accept ragged array inputs other than feeding in a list.
x = [v.flatten().tolist() for v in masked_adv_input]
inputs = self._processor(x,sampling_rate=self.sr,return_tensors='pt', padding=True)
after feeding the inputs into the model to get the loss and logits and calling loss.backward(), the gradients to local_delta is None. How can i fix this?
i did try using
inputs = self._processor(original_input, sampling_rate=self.sr, return_tensors='pt',padding=True)
adv_input = local_delta_rescale + inputs['input_values']
masked_adv_input = adv_input * torch.tensor(input_mask).to(self.estimator.device)
inputs['input_values'] = masked_adv_input
but this doesn’t seem to give the correct values