Pytorch autograd variable graph destroyed when using wave2vec2 processor

Hi , i am trying to generate adversarial audio examples on huggingface model using ART GitHub - Trusted-AI/adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams. Specifically the class ImperceptiblePytorchASR algorithm based on carlini [1801.01944] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text.

I extended their estimator class to use huggingface model but can’t find a solution for the autograd variable being destroyed when i send my inputs for feature extraction to the wave2vec2 processor.

    self._processor = Wav2Vec2Processor.from_pretrained(pretrained_model)
    self.global_optimal_delta = Variable( 
        torch.zeros(self.batch_size, self.global_max_length).type(torch.cuda.FloatTensor),
        requires_grad=True
    )
    local_delta = self.global_optimal_delta[:local_batch_size, :local_max_length]
    local_delta_rescale = torch.clamp(local_delta, -self.eps, self.eps).to(self.estimator.device)
    local_delta_rescale *= torch.tensor(rescale).to(self.estimator.device)
    adv_input = local_delta_rescale + torch.tensor(original_input).to(self.estimator.device)
    masked_adv_input = adv_input * torch.tensor(input_mask).to(self.estimator.device)
   
   # i cant find any other solution for getting the processor to accept ragged array inputs other than feeding in a list.
    x = [v.flatten().tolist() for v in masked_adv_input]
    inputs = self._processor(x,sampling_rate=self.sr,return_tensors='pt', padding=True)

after feeding the inputs into the model to get the loss and logits and calling loss.backward(), the gradients to local_delta is None. How can i fix this?

i did try using

inputs = self._processor(original_input, sampling_rate=self.sr, return_tensors='pt',padding=True)
adv_input = local_delta_rescale + inputs['input_values']
masked_adv_input = adv_input * torch.tensor(input_mask).to(self.estimator.device)
inputs['input_values'] = masked_adv_input

but this doesn’t seem to give the correct values