How can I implement this BERT model for sequential sentences classification using HuggingFace?

I want to classify the functions of sentences in the abstracts of scientific papers, and the function of a sentence is related to the functions of its surrounding sentences.

I found the model proposed in this paper very useful and straightforward, it just fed the BERT model with multiple sentences with multiple [SEP] tokens to separate them. See the figure below:

I can train (fine-tune) this model using their codes, but I would also like to build this model using the transformers library (instead of allennlp) because it gives me more flexibility.

The most difficult problem for me is how to extract the embeddings of all [SEP] tokens from a sample (multiple sentences). I tried to read their code but found it quite difficult for me to follow. Could you help me with this procedure?

Thanks in advance!

1 Like