Please read the topic category description to understand what this is all about
Description
The Contract Understanding Atticus Dataset (CUAD) is a new dataset for legal contract review. Legal contracts often contain a small number of important clauses that warrant review by lawyers. This is a time-intensive task that requires specialised knowledge, so the goal of this project is to see if Transformer models can be used to extract answers to a predefined set of legal questions.
Model(s)
Many of the Question Answering
models on the Hub could serve as a good baseline to get started. Given the specialised domain, you will probably want to try:
- Fine-tuning encoder-based models like BERT, RoBERTa, DeBERTa and friends
- Performing domain adaptation, by first fine-tuning the language model before tuning the question-answering head
Datasets
CUAD is available on the Hub.
Challenges
This is a highly specialised domain, so a vanilla Transformer may not obtain great results.
Desired project outcomes
Create a Streamlit of Gradio app on Spaces that allows someone to select a legal contract, one or more questions, and provide the answer.
Additional resources
Discord channel
To chat and organise with other people interested in this project, head over to our Discord and:
- Follow the instructions on the
#join-course
channel - Join the
#ai-law-assistant
channel
Just make sure you comment here to indicate that you’ll be contributing to this project