Hi, thank you for the reply and advice.
I forgot to mention that I want the summary to be simplistic as possible so even the average Joe would understand them. Hence that’s why I’m trying to clean up the legalese before feeding it to the summarizer.
So for the memory issue - I tried it via Google Collab with GPU and tried to utilize the Pegasus model.
Upon reaching this line - tokenizer = AutoTokenizer.from_pretrained(“google/pegasus-cnn_dailymail”, use_fast=False)
I got an error stating
"ValueError: Couldn’t instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
Then I found the docs and put use_fast=False and it didn’t work.
I also updated to the latest version of PIP(pip-21.0.1) - still the same error
I also downloaded this sentencepiece (Successfully installed sentencepiece-0.1.91) - still the same error persisted
