Description
I’m looking to do some sentiment analysis on long text – think central banker speeches that can be 8-12 pages long. Instead of classifying the texts into the usual positive/negative/neutral, I’d like to assign them the following categories:
- Hawkish: calling for interest rate hikes
- Dovish: calling for interest rate cuts
- Neutral: calling for interest rates to remain unchanged
These categories can be inferred from the speaker’s position on 5 key subjects, but there may be more that can be learned…
- Unemployment: if they mention that it is too low, that is considered hawkish; too high would be dovish
- Growth: too low → dovish, too high → hawkish
- Inflation: too low → dovish, too high → hawkish
- Interest rates: too low → hawkish, too high → hawkish
- Quantitative easing / tightening: easing → dovish, tightening → hawkish
Training data
I have text data from meeting minutes that were associated with rate hikes/cuts, so it’s quite easy to label them as hawkish or dovish. Some of these texts will have both dovish and hawkish components at the same time, however, which might throw off an NLP model e.g., unemployment is very low, yet we will not be raising interest rates
Need help with
- Formulating the problem as an ML problem. What class of models are best to use for this task?
- Ideally, I’d want something that can generalize well as I will be training based on relatively structured meeting minutes data and testing on speeches that don’t necessarily have the same structure
- Model needs to be able to process texts of different lengths
- I’m not really interested in the classification per se. I’d like to use the logits to create a “hawkishness” score that ranges between -100 (super dovish) to +100 (super hawkish). Do I need a neutral class to achieve this?
Thanks!