Hi Guys:
I am working on a project to summarize legal cases. The case documents have some degree of structure (similarish length, header with case_ids, defendent/plaintiff and their lawyers etc.)
I want to summarize the text into a 1 paragraph summary understandable by the average person. It seems to me that enhancing the embedding with some custom NER type tags (such as ‘Plaintiff’ and ‘Defendent’ etc) would improve performance.
I have looked into using Spacy and Flair for custom NER tagging. However not sure how to incorporate that into a Hugging Face pipleline (for example legal-bart).
I do plan to fine tune the HF model with some examples of summarized texts. But the problem I am having is how to add NER to the token representation / embedding to be fed into the Transformer.
Any tips or pointers to references would be highly appreciated.
Faisal