Hello HuggingFace Community,
I am reaching out to seek your expertise in developing a PyTorch-based architecture for a complex NLP task. Specifically, my project involves processing textual comments, where the primary objectives are twofold:
- Segmentation into Elementary Discourse Units (EDUs): Each comment needs to be divided into its constituent EDUs. I have access to ground truth data that defines these EDU boundaries, which should aid in training and validating the segmentation model.
- Classification of Topics and Sentiments: Once segmented, each EDU requires analysis for two aspects - its topical content and the sentiment it expresses. Again, I have ground truth data available for both topics and sentiments associated with each EDU.
Given these requirements, I am looking for advice on designing an effective PyTorch architecture that can handle both segmentation and classification tasks efficiently. I am particularly interested in any insights on:
- Suitable model architectures or pre-trained models that can be adapted for this task.
- Strategies for multi-task learning, if applicable, to handle both segmentation and classification in a unified framework.
- Data preprocessing and feature engineering techniques that are effective for EDU-based analysis.
Any examples, reference implementations, or pointers to relevant literature would be greatly appreciated. Additionally, I am open to collaboration or further discussion if anyone is particularly interested in this area of NLP.
Thank you in advance for your time and assistance!