Adding Entity Tags to Transformer Input Embedding for Text Summarization

Hi Guys:

I am working on a project to summarize legal cases. The case documents have some degree of structure (similarish length, header with case_ids, defendent/plaintiff and their lawyers etc.)

I want to summarize the text into a 1 paragraph summary understandable by the average person. It seems to me that enhancing the embedding with some custom NER type tags (such as ‘Plaintiff’ and ‘Defendent’ etc) would improve performance.

I have looked into using Spacy and Flair for custom NER tagging. However not sure how to incorporate that into a Hugging Face pipleline (for example legal-bart).

I do plan to fine tune the HF model with some examples of summarized texts. But the problem I am having is how to add NER to the token representation / embedding to be fed into the Transformer.

Any tips or pointers to references would be highly appreciated.


1 Like