Adding Entity Tags to Transformer Input Embedding for Text Summarization

fiqureshi1 · January 6, 2023, 4:50am

Hi Guys:

I am working on a project to summarize legal cases. The case documents have some degree of structure (similarish length, header with case_ids, defendent/plaintiff and their lawyers etc.)

I want to summarize the text into a 1 paragraph summary understandable by the average person. It seems to me that enhancing the embedding with some custom NER type tags (such as ‘Plaintiff’ and ‘Defendent’ etc) would improve performance.

I have looked into using Spacy and Flair for custom NER tagging. However not sure how to incorporate that into a Hugging Face pipleline (for example legal-bart).

I do plan to fine tune the HF model with some examples of summarized texts. But the problem I am having is how to add NER to the token representation / embedding to be fed into the Transformer.

Any tips or pointers to references would be highly appreciated.

Faisal

Topic		Replies	Views
How to use additional input features for NER? Beginners	27	15962	June 5, 2023
Tokenization in a NER context 🤗Tokenizers	5	5711	August 11, 2021
NER model fine tuning with labeled spans Beginners	5	3913	May 7, 2023
Add per-word embedding from outer source to Bert embedding layer Beginners	0	297	January 14, 2024
BART summarization: strategies to improve entity preservation Models	0	497	November 3, 2021

Adding Entity Tags to Transformer Input Embedding for Text Summarization

Related topics