Seeking Advice on Implementing HTML Inspection Service

I want to implement a service that checks for broken HTML files and pinpoints the exact locations of errors, such as missing tags, excessive tags, unexpected special characters, etc.

I have a large dataset containing both valid and invalid HTML files. So far, I’ve chosen an LSTM model, which effectively reconstructs missing tags. Then, I compare the reconstructed text with the original and show the diff.

However, I’m unsure if this model will fulfill all my requirements or if there might be a better option available for my needs. I would appreciate any advice.