Prompt Tuning For Sequence Classification

neeraj1909 · June 10, 2023, 9:30am

I am trying the prompt tuning for Hate Speech Classification. I have gone through several papers and found that there are two types of prompts- Hard prompts, and Soft prompts.

I am trying hard prompts for my task.

How can we use PEFT model for our task?

If not, then what are other possible ways for Prompt tuning for BERT Model?

abhinavp · June 12, 2023, 12:26am

No one could possibly answer your question as you have simply not provided any useful details.

neeraj1909 · June 12, 2023, 10:50am

Sorry for not providing the code as I was not implemented the code.

My implementation is:

from transformers import AutoModelForSequenceClassification,  AutoConfig
from peft import PeftModelForSequenceClassification, get_peft_config

config = AutoConfig.from_pretrained("google/muril-base-cased")

config = {
    "peft_type": "PREFIX_TUNING",
    "task_type": "SEQ_CLS",
    "inference_mode": False,
    "num_virtual_tokens": 20,
    "token_dim": 768,
    "num_transformer_submodules": 1,
    "num_attention_heads": 12,
    "num_layers": 12,
    "encoder_hidden_size": 768,
    "prefix_projection": False,
}

peft_config = get_peft_config(config)

model = AutoModelForSequenceClassification.from_pretrained(PRE_TRAINED_MODEL_NAME, num_labels=len(class_names))
peft_model = PeftModelForSequenceClassification(model, peft_config)
peft_model = peft_model.to(device)

My training function is:

def train_epoch(model, data_loader, loss_fn, optimizer, device, scheduler, n_examples):
    model = model.train()

    losses = []
    correct_predictions = 0
    
    progress_bar = tqdm(range(num_training_steps))

    for d in data_loader:
        input_ids = d["input_ids"].to(device)
        attention_mask = d["attention_mask"].to(device)
        targets = d["targets"].to(device)

        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask
        )

        outputs = F.softmax(outputs.logits, dim=-1)

        _, preds = torch.max(outputs, dim=1)
      
        loss = loss_fn(outputs, targets) #.unsqueeze(1))        
        correct_predictions += torch.sum(preds == targets)
        losses.append(loss.cpu().detach())

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(EPOCHS)

    return correct_predictions.double() / n_examples, np.mean(losses)

Hyperparameters setup

patience = 10

optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)

num_training_steps = EPOCHS * len(train_data_loader)

scheduler = get_linear_schedule_with_warmup(
  optimizer,
  num_warmup_steps=0,
  num_training_steps=num_training_steps 
)

loss_fn = torch.nn.CrossEntropyLoss(weight=class_weights).to(device)

Why the training is stuck to 0.632 validation accuracy?
Am I doing something wrong?

Epoch 1/100
------------------------------
100%|██████████| 36600/36600 [01:28<00:00, 413.98it/s]Train loss 0.6931820511817932 accuracy 0.5681176772427948

Val   loss 0.6930877877318341 accuracy 0.6320109439124487

**********
Best model found in Epoch 1 with val_loss 0.6930877877318341

**********

Epoch 2/100
------------------------------
100%|██████████| 36600/36600 [01:29<00:00, 407.90it/s]Train loss 0.6931309700012207 accuracy 0.6001881467544685

Val   loss 0.6930594962576161 accuracy 0.6320109439124487

**********
Best model found in Epoch 2 with val_loss 0.6930594962576161

**********

Epoch 3/100
------------------------------
100%|██████████| 36600/36600 [01:29<00:00, 408.18it/s]Train loss 0.6931337714195251 accuracy 0.6069443256649277

Val   loss 0.6930221902287524 accuracy 0.6320109439124487

**********
Best model found in Epoch 3 with val_loss 0.6930221902287524

**********

Epoch 4/100
------------------------------
 72%|███████▏  | 26400/36600 [01:04<00:24, 408.29it/s]Train loss 0.6931185126304626 accuracy 0.6079705806893013

Val   loss 0.6930161818214085 accuracy 0.6320109439124487

**********
Best model found in Epoch 4 with val_loss 0.6930161818214085

**********

Epoch 5/100
------------------------------
100%|██████████| 36600/36600 [01:29<00:00, 408.66it/s]Train loss 0.6931254863739014 accuracy 0.6085692294535192

Val   loss 0.6930019298325414 accuracy 0.6320109439124487

**********
Best model found in Epoch 5 with val_loss 0.6930019298325414

**********

abhinavp · June 14, 2023, 11:47pm

Accuracy is a discrete metric (unlike pretty much all loss metrics used with neural networks) and thus it’s entirely possible that your loss could be decreasing while your accuracy is flatlining. Obviously, for any model, neither validation loss nor accuracy will ever reach 0/100%, respectively, unless your validation data is your training data. The point of the validation data is to help us decide when to stop training- if your training loss is decreasing but your validation loss has flatlined or is even increasing for multiple epochs, you are likely only going to overfit if you keep training.

valbuc · June 16, 2023, 1:37pm

Have you tried out different learning rates? Prompt Tuning sometimes requires surprisingly high learning rates of for instance 0.3. This post is quite good in my opinion: Guiding Frozen Language Models with Learned Soft Prompts – Google AI Blog

Which dataset are you using?

Reem · December 19, 2023, 8:53am

I have a question: If I use PEFT, Do I have to change the Bert tokenizer to include the prompt tokens?

Topic	Replies	Views
PEFT prompt tuning for SEQ_CLS with BERT causes unexpected keyword argument 'label' Beginners	223	May 15, 2024
Peft Prompt Tuning - ValueError: `create_and_replace` does not support prompt learning and adaption prompt yet Models	466	January 4, 2024
Prompt Tuning for Sequence Classification using PEFT Models	134	January 17, 2024
Relation between PEFT model and a regular model with prompt Beginners	376	August 8, 2023
How to fine tune BertForSequenceClassification with PEFT? Intermediate	918	May 10, 2023

Prompt Tuning For Sequence Classification

Related topics