I want to score a bunch of paper survey forms. They are supposed to circle the choice they select. The choices are T/F or N/S/O/A for Never, Sometimes, Often or Always. Note that they can also change their mind and cross out or scratch out an answer and circle another.
I tried OCR, but of course it focused on the text, which I don’t care about, all I want to know what is the question number and what was selected. I would like it to turn into a file like this:
1 T
2 S
3 A
4 F
Etc.
I’d be happy to create a small number of pages of training data, but it’s a PITA so I want to keep it the minimum. I’m not sure how to train a system to throw away all the question text and focus only on the question number and what is circled. I don’t have a GPU either so I’m hoping to get something which requires minimal training
I’m still looking for some help with my task of scoring survey forms and identifying the letters that have been circled. I’m trying to use machine learning to do this, but I’m not sure how to approach it.
I’ve tried using OCR to extract the text from the images, but this doesn’t work well because I only care about the question numbers and the circled letters, not the surrounding text. I’ve also considered using a convolutional neural network (CNN) or a recurrent neural network (RNN) to classify the answers directly from the images, but I’m not sure which approach would be most effective.
I would really appreciate any suggestions or guidance on how to tackle this problem. I’m happy to provide more information about the data and my specific goals if that would be helpful.
Thank you in advance for any help you can provide!