How to use markupLM for QA on HTML text longer than 512 tokens?

I am trying to run markupLM to do question answering but my HTML string is longer than 512 tokens. Is the solution to parse the HTML yourself and get the node / xpath information and curate the inputs and questions to specific chunks of text?

Curious if there are any examples to refer to here where someone actually processes HTML string more than 512 tokens long? Thank you!