I want to summarize the T&Cs and privacy policies of various services. I’ve decided to do it via a hybrid approach where I initially pre-process the terms or policies and try to remove as many legalese/complex words as possible.
Next, I would like to use a pre-trained model for the actual summarization where I would give the simplified text as an input.
I wanna utilize either the second or the third most downloaded transformer( sshleifer / distilbart-cnn-12-6 or the google / pegasus-cnn_dailymail) whichever is easier for a beginner / explain for you.
I already tried out the default pipeline.
summarizer = pipeline(‘summarization’) and got back a summary for a paragraph of the T&C of Instagram.
I tried using the Pegasus model following this tutorial and got “RuntimeError: CUDA out of memory” where I ran out of memory on my GPU.
Thank you for your valuable time and help