Hi, I’m researching a transformer model that takes a thesis (or set of theses) (on a specific field - material engineering) as input, and predicts some NLP Q&A tasks. Generally, the model can takes max 512 sequences as a input, but the words used in a paper is average around 10000. So far, I solved this problem by referring this post: How to Apply Transformers to Any Length of Text (Sentiment Analysis With Long Sequences | Towards Data Science)
But this post also has a problem that truncated contexts and outputs has no any effects to other predictions (transfer parameter or output to next prediction, refer previous contexts, etc…). As with any text, this problem is especially fatal in a thesis context due to all paragraphs have relation with other paragraphs. I thought some ways to solve this problems, but I’m not sure these are possible to implement:
- Preserve previous context - If this is possible, It is also able to put an entire thesis context as a single input.
- Transfer previous prediction output and update parameter like fine-tuning process - I’m not sure it is possible or impossible due to no knowledge.
Is there any solution exists about this problem? Thanks to read.