Text-to-feature FinBERT for regression

You can easily get a feature vector for a given piece of text as follows:

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer("ProsusAI/finbert")
model = BertModel.from_pretrained("ProsusAI/finbert")

text = "hello world"
encoding = tokenizer(text, return_tensors="pt")

# forward pass
outputs = model(**encoding)

# get feature vector 
feature_vector = outputs.last_hidden_state[:,0,:]

Here I’m taking the final hidden state of the [CLS] token, which serves as a good representation of an entire piece of text. This is a vector of size (768,) for BERT-base-sized models.

I’m not sure what you mean by fine-tuning a BERT model on your dataset. You can fine-tune BERT on a regression problem, where the inputs are text and the outputs are floats. Is this what you want to do?

1 Like