I have a simple binary classifier that is tagging documents that represent <1% of examples in the broader corpus. Because of this it is much easier to review and diagnose issues with false positives than it is false negatives.
Are there any tricks to modify binary crossentropy loss to incentivize a model to prioritize high recall, even if a the expense of a little provision?
I’ve considered just multiplying the loss by a small penalty for positive example, but not sure what the penalty should be / depend on (should it be a function of the learning rate for example?).