I need to extract info from italian medical records. Since such documents are somewhat long (2-3k words), my idea is to split them in subsections, and then try to classify the topic of each subsection. This would allow me to be more confident about info extraction (i.e. I am extracting a specific info from the correct subsection).
I was wondering if zero-shot classification is the best tool to do this. Moreover, since the content of records is very specific (exams results, medical reports and so on), I think I will need to do a fine-tuning of some kind.
If this is correct, how could I do that? And approximately how many data I would need?
Thanks a lot