Finetune pretrained BERT for custom regression task

I have a list of sentences:
X = ["Today is Tuesday", "I went to the store", "This is a computer",....]
and for each sentence the label is vector of 5 floats:
y = [[1,4,3,1,7], [5,1,2,8,9],[0,1,6,5,2],....]
I want to finetune BERT (or other sutiable pre-trained LM) with the proper head to predict the labels.
But I couldnt find ant example to something similar.
Can someone please provide a code sample as to how I can do it?
I am really helpless.

Thanks

Does the order of the numbers in the vector matter? Are you able to provide more context about what the vector signifies?

@nbroad Sure! sorry for being unclear.
The label vec is a representation vector of other higher dimensional data , so the order does matter.
Actually the head I thought would be the most reasonable to add is
sklearn.linear_model.Ridge
Which suitable as it is compatible it multiple output for y.
What am I missing is how to add the add on top of BERT and perform the fine-tuning

You might be able to do something like the code below. The labels need to be in the dataset as (n, 5, num_labels) where n is the total number of examples and num_labels is the number of labels for the output vector.

It is just doing multiclass classification 5 separate times – one for each position in the vector.

You should be able to use this model in a Trainer.

class CustomModel(PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.backbone = AutoModel.from_config(config)

        self.outputs = [nn.Linear(config.hidden_size, num_labels) for _ in range(5)]

    def forward(
        self,
        input_ids,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        labels=None,
    ):
        outputs = self.backbone(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
        )

        sequence_output = outputs.last_hidden_state
        outputs = [self.outputs[i](sequence_output) for i in range(5)]

        # if labels, then we are training
        loss = None
        if labels is not None:

            loss_fn = nn.BCEWithLogitsLoss()
            losses = [loss_fn(outputs[i], labels[i]) for i in range(5)]
            loss = sum(losses)/len(losses)



        return {
            "loss": loss,
            "logits": outputs
        }


# You'll also have to do this when creating the model
# Config is from AutoConfig.from_pretrained(model_path)
# model_path is something like bert-base-cased

def get_pretrained(config, model_path):
    model = CustomModel(config)

    if model_path.endswith("pytorch_model.bin"):
        model.load_state_dict(torch.load(model_path))
    else:
        model.backbone = AutoModel.from_pretrained(model_path)

    return model

@nbroad Sorry not sure I completely understand:

  1. My goal is not only to get a prediction but also to fine-tune BERT, how does it happen here?
  2. Why the labels are 3D and not just (n,5)? 5 is the number of labels
  3. I actually rather use Ridge Regression if possible, is there a way to do so? Also I prefer not to do it 5 separate times - one for each position, as A) my actual label is more around 300 ints per vector B) I want to minimize the Ridge loss altogether

Is this possible?

Ok you could have one linear layer (nn.Linear(config.hidden_size, 5)) and just use MSELoss. I think that would work. It won’t output ints but you can round

@nbroad Thanks. Sure floats are fine.
Can u please provide a code example?
I really dont understand how to implement this

class CustomModel(PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.backbone = AutoModel.from_config(config)

        self.output = nn.Linear(config.hidden_size, config.num_labels)

    def forward(
        self,
        input_ids,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        labels=None,
    ):
        outputs = self.backbone(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
        )

        sequence_output = outputs.last_hidden_state
        outputs = self.output(sequence_output)

        # if labels, then we are training
        loss = None
        if labels is not None:

            loss_fn = nn.MSELoss()
            loss = loss_fn(outputs, labels)
            
        return {
            "loss": loss,
            "logits": outputs
        }

Have your labels be shape (n, 5) where n is the number of samples.


model_name = "roberta-base"
cfg = AutoConfig.from_pretrained(model_name)
cfg.update({
 "num_labels": 5
})

model = get_pretrained(cfg, model_name)

# put the model and data into Trainer

@nbroad, how to get [CLS] embedding from your code, it says “AttributeError: ‘dict’ object has no attribute ‘last_hidden_state’”

There is no CLS embedding, but you can get the model predictions by doing

output = model(**inputs)
predictions = output["logits"]

@nbroad , I want to finetune BERT with regression tasks and then use its embedding as a feature for prediction. Where do I need to modify? i need this embedding so that I can apply XGBoost as well