Unsupervised Data Augmentation (UDA) Leading to Significantly Poorer Performance with Roberta?

rkabir · July 15, 2025, 8:54pm

I was attempting to implement unsupervised data augmentation (UDA) with Roberta. Essentially, it is semi-supervised learning technique where in each iteration, a mini-batch of labeled data is processed as well as unlabeled. The model’s output to the mini-batch of unlabeled data is then compared to the model’s output on an augmented version of the unlabeled data. The idea is to make the model invariant across insignificant differences.

However, when I was trying to implement this approach, I found that the evaluation f1 score was worse than when not using UDA (both approaches are included in the train function below). In fact, I found much more success when first going through all of the labeled data for a certain number of epochs before transitioning to using all of the unlabeled data (not included in the function below).

I was wondering if anyone had any insight into why my implementation of UDA was doing so poorly.

I made a custom HuggingFace Trainer module and here is a relevant part:

def train(self, resume_from_checkpoint=None, trial=None, ignore_keys_for_eval=None, **kwargs):
        # Initialize optimizer and scheduler
        num_training_steps = len(self.get_train_dataloader()) * self.args.num_train_epochs
        self.create_optimizer_and_scheduler(num_training_steps)

        model = self.model
        model = model.to('cuda')
        
        if not UDA: 
            [... not relevant ...]
        else:
            for epoch in range(int(self.args.num_train_epochs)):
                loader_a = self.get_train_dataloader() 
                augmented_df = self.dataset_b
                for (i, batch), (j, batch_df) in zip(enumerate(loader_a), enumerate(generate_random_batches(augmented_df, UNSUPERVISED_BATCH_SIZE))):
                    ### Labeled part! ###
                    batch = batch.to('cuda')   
                    outputs = model(**batch)
                    loss = outputs.loss
                    
                    print(f"{epoch} Supervised loss: {loss}")
                    loss.backward()
                
                    self.optimizer.step()
                    self.lr_scheduler.step()
                    self.optimizer.zero_grad()
                    
                    ### Unlabeled part! ###
                    temp_df_v1 = batch_df[["sentence", CATEGORY]]
                    temp_df_v1 = convertDataFrametoDatasetObject(temp_df_v1)

                    batch = {}
                    batch["input_ids"] = temp_df_v1["input_ids"]
                    batch["input_ids"] = batch["input_ids"].to('cuda')
                    batch["attention_mask"] = temp_df_v1["attention_mask"]
                    batch["attention_mask"] = batch["attention_mask"].to('cuda')
                    batch["labels"] = temp_df_v1["labels"]
                    batch["labels"] = batch["labels"].to('cuda')
                    
                    # Finetune on augmented portion with labels retrieved 
                    outputs = model(**batch)
                    logits = outputs.get('logits')
                    batch_df["Model Outputs"] = None 
                    logit_index = 0
                    for index, row in batch_df.iterrows():
                        value = logits[logit_index].softmax(dim=-1).detach().cpu().flatten().numpy().tolist()
                        batch_df.loc[index, "Model Outputs"] = "yes" if value[1] > 0.5 else "no"
                        logit_index += 1

                    batch_df["sentence"] = batch_df["augmented_sentence"]
                    batch_df[CATEGORY] = batch_df["Model Outputs"]
                    temp_df_v2 = batch_df[["sentence", CATEGORY]]
                    temp_df_v2 = convertDataFrametoDatasetObject(temp_df_v2)
                    
                    batch = {}
                    batch["input_ids"] = temp_df_v2["input_ids"]
                    batch["input_ids"] = batch["input_ids"].to('cuda')
                    batch["attention_mask"] = temp_df_v2["attention_mask"]
                    batch["attention_mask"] = batch["attention_mask"].to('cuda')
                    batch["labels"] = temp_df_v2["labels"]
                    batch["labels"] = batch["labels"].to('cuda')

                    outputs = model(**batch)
                    loss_unsupervised = outputs.loss 
                    
                    print(f"{epoch} -- Unsupervised loss: {loss_unsupervised}")
                    
                    loss_unsupervised.backward()
                    
                    self.optimizer.step()
                    self.lr_scheduler.step()
                    self.optimizer.zero_grad()
            
        print("Training is Done")

rkabir · July 16, 2025, 8:08pm

UPDATE: One solution that I believe works is combining the losses before performing 1 backpropagation, as opposed to changing the weights twice per iteration.

Topic		Replies	Views
Using mixup on RoBERTa Research	7	2277	December 18, 2024
No Improvement in Results after Implementing Unsupervised Denoising Training Technique for T5 Model using Hugging Face Models	0	120	April 25, 2024
RoBERTa fine-tuning, CUBLAS_STATUS_NOT_SUPPORTED Beginners	0	975	December 20, 2022
Implementation of Two Distinct Datasets with HuggingFace Trainer Module Intermediate	5	36	June 18, 2025
Train Roberta from scratch for custom dataset Intermediate	1	946	May 2, 2023

Unsupervised Data Augmentation (UDA) Leading to Significantly Poorer Performance with Roberta?

Related topics